If it is between one of the poor of your city and one of the poor of another city, the one of the poor of your city takes precedence. (Talmud, Baba Metzia 71a).
Don’t get me wrong; I don’t care much about what the so-called “sages” of old have said and how their opinions are radically different but valid simultaneously, like quantum “Talmudic” perspectives that sometimes collapse under their open claims. For a Universe, so ordered and mathematical, religious commentaries are notoriously speculative and relativistic. But what isn’t if there are humans around? This saying stuck with me, like some tool I’ve bought for fixing something in the house but never had the chance to use it properly. You remember the tool’s closeted existence, and you contemplate on justifying the Initial investment.
The year is 1970, my father is 11, my mother is 8, and Gil Scott-Heron releases a song called Whitey on the Moon.
(…)
A rat done bit my sister Nell
With whitey on the moon
Her face and arms began to swell
And whitey’s on the moon
I can’t pay no doctor bills
But whitey’s on the moon
(…)
How can anyone be so offended by this lovely scientific trip to the moon and back? It’s our shared victory; just like in sports, the human team scored against the vast, cold, extremely hot at times, and viciously inhospitable (team) Universe. It’s so empty; when it’s not, it wants to break your chemistry. But, we’ve just conquered Space by travelling one light-second to excellence, to the moon and back.
Thirty-five years later, another American musician, Darondo, released a brilliant album and a song called: Let my people go. Same topic, same lyrics, more music (in the classical sense) than spoken poetry:
Man, to your rocket ship
Take you to the moon
A million-dollar mission
To bring back a piece of rock
We got starvation, panic across the land
And here’s a fool in a rocket ship
Trying to be Superman
And then I realised that going to space is like feeding the poor of another city without taking care of your own. Which city?
But what if cities are within cities, and where do their borders end?
Furthermore, if you are a king ruling over a kingdom with multiple cities, which one is your city? The capital city? The Palace? Your vast bed chambers or the poor thoughts running through your head?
What if you live alone; people who live alone can’t build physical rocket ships. No offence, please, people who live alone.
What if all the people of your city are poor, but they don’t realise this fact about them? They are flawed in ways that are not obvious to an uneducated eye, and so are they: uneducated. But does it matter how educated your eye is if we’ve got starvation and panic across the land? Darondo asks.
Is it a sin to invest time and money in something so immediately useless as going to the moon without properly fixing your city first? Rocks are boring, the moon has almost no atmosphere, the view is not great, and the “climbing” (or “diving”) equipment is quite heavy even in low gravity.
Is this trip as “useless” as being a mathematician of the old centuries, spending unhealthy amounts of time studying prime numbers, while the poor are wearing wooden sandals and are dying of malnutrition, wars and pneumonia? After all, who needs your primes and zeta functions, when there is not enough bread on the tables or wheat in the granaries?
What if the poor are not poor? Whom should you help first?
What if you are poor in your city? In this case, by the Talmudic advice based on proximity, you’d better help yourself first.
I don’t have the answer, but it’s an exciting topic to meditate on. Should we science our way in directions so remote from eradicating poverty, or should we dedicate everything we have to fix our world, and the rest can wait? But why hurry?
At least Bowie’s spiders haven’t sold the world.
]]>I did this exercise on two popular platforms where I still have active accounts. One is the most extensive professional network, where people still debate the advantages of working from home versus returning to the office. The second is the platform our parents and early (as in older) millennials hold dear.
The results were underwhelming, but I wanted to scroll more; maybe something of interest would pop out from nowhere to comfort my sensation of void and emptiness. I was thinking about how my fingers are so accustomed to the inertia of scrolling down that they don’t even listen when asked to stop.
The wall on the professional network is almost sad. People generally love to take themselves too seriously in professional environments. This weakness is projected in how they act and talk. It’s like they identify themselves with their jobs too much, and they let themselves affected by this phenomenon.
Let me list my findings:
And here I’ve stopped “analysing” my personalised wall. The algorithm failed to display valuable content. It’s Static Noise, with modest spikes of statistical relevance. So I’ve decided to quit mainstream social media sites. This time for good. Hopefully. Time is a valuable resource. Life is short. Social Media feeds on our time.
Social media, I quit!
]]>This post is satire. It may also not make too much sense.
Today I’ve seen this picture on my LinkedIn wall:
And it triggered me badly. There are people in my network highly sensitive to the divine nature of numbers. As a math enthusiast, I won’t get it.
So, for all the people believing in numerology, let’s find the numbers that carry highly amplified or/and uber-vibrational powers, more than you can imagine.
Keep in mind 37
is super-weak compared to 15873
:
The magic of the number 15873
15873 x 7 = 111111
15873 x 14 = 222222
15873 x 21 = 333333
15873 x 28 = 444444
15873 x 35 = 555555
15873 x 42 = 666666
15873 x 49 = 777777
15873 x 56 = 888888
15873 x 63 = 999999
(As you can see, 15873 times a multiple of 7, gives a sextuple Rep Digit,
short for Repeating Digit which carry amplified or vibrational powers.
But let me tell you, even 15873
is super-weak compared to other numbers that carry uber-highly-amplified and/or uber-highly-vibrational powers. So let’s write a small python script that can list more vibrational powerful numbers (whatever that is) than 37
and 15873
.
from primefac import primefac
from collections import Counter
import sys
def compute_vibrational_num(vp):
n, res, vp_ret = 1, 0, vp
while vp != 0: n, vp = 1000 * n + 1, vp - 1
n *= 111
primef = Counter(list(primefac(n))).most_common(2)
primef = primef[1 if primef[1][0] == 7 else 0]
magic = primef[0] ** primef[1]
res=n//magic
return (magic, int(res), vp_ret)
def print_text(tpl):
print(f"The magic of the number {tpl[1]}")
for i in range(1,10):
print(f"{tpl[1]} x {tpl[0]*i} = {tpl[1] * tpl[0] * i}")
print(f"""
(As you can see, {tpl[1]} times a multiple of {tpl[0]}, gives a {(tpl[2]+1)*3}nthuple Rep Digit,
short for Repeating Digit which carry amplified or vibrational power.
""")
Yes, I know, it looks bad, and you also have to install a pip package called primefac to do the hard work for us. Now, the vp
parameter from our function gives us the vibrational power (whatever that is) of the number.
For example, if we call print_text(compute_vibrational_num(0))
, we get the results from the picture (that’s boring):
The magic of the number 37
37 x 3 = 111
37 x 6 = 222
37 x 9 = 333
37 x 12 = 444
37 x 15 = 555
37 x 18 = 666
37 x 21 = 777
37 x 24 = 888
37 x 27 = 999
(As you can see, 37 times a multiple of 3, gives a 3nthuple Rep Digit,
short for Repeating Digit which carry amplified or vibrational power.
But if we increase the vibrational power of the number to, let’s say, vp=13
, we will get something significantly more powerful:
The magic of the number 2267573696145124716553287981859410430839
2267573696145124716553287981859410430839 x 49 = 111111111111111111111111111111111111111111
2267573696145124716553287981859410430839 x 98 = 222222222222222222222222222222222222222222
2267573696145124716553287981859410430839 x 147 = 333333333333333333333333333333333333333333
2267573696145124716553287981859410430839 x 196 = 444444444444444444444444444444444444444444
2267573696145124716553287981859410430839 x 245 = 555555555555555555555555555555555555555555
2267573696145124716553287981859410430839 x 294 = 666666666666666666666666666666666666666666
2267573696145124716553287981859410430839 x 343 = 777777777777777777777777777777777777777777
2267573696145124716553287981859410430839 x 392 = 888888888888888888888888888888888888888888
2267573696145124716553287981859410430839 x 441 = 999999999999999999999999999999999999999999
(As you can see, 2267573696145124716553287981859410430839 times a multiple of 49, gives a 42nthuple Rep Digit,
short for Repeating Digit which carry amplified or vibrational power.
I wish you to find the most vibrational number your hardware allows to.
Repdigits are natural numbers composed of instances of the same digit. The coolest repdigits are the Mersene primes. They are prime numbers that, when represented in binary, are composed only of 1
s.
All in all, let’s get back to our powerful vibrational numbers (whatever that means).
Numbers like 11...1
can sometimes be prime, but if the number of 1
digits is a multiple of 3, we know they aren’t. It’s a simple proof:
If we group the digits 3
by 3
, we’ll get the following relationship:s
And then if we use 111
as a common factor we obtain:
But \(111=3 * 37\), so repdigits with the number of digits of \(3\), are always divisible with \(3\) or \(37\). As an interesting observation, they are from time to time divisible with \(7\) (but not always).
This is how our code functions:
111..111
;3
or 7
as the meme multiplier;This tutorial is in early draft. If you see any errors, feedback is greatly appreciated.
Bitwise operations are a fundamental part of Computer Science. They help Software Engineers to have a deeper understanding of how computers represent and manipulate data, and they are crucial when writing performance-critical code. Truth being said, nowadays, they are rarely used in the business code we write, and they stay hidden in libraries, frameworks, or low-level system programming codebases. The reason is simple: writing code that operates on bits can be tedious, less readable, not always portable, and, most importantly, error-prone. Modern programming languages nowadays have higher-level abstractions that replace the need for bitwise operations and “constructs”, and trading (potential) small performance and memory gains for readability is not such a bad deal. Plus, compilers are more intelligent nowadays and can optimise your code in ways you (and I) cannot even imagine.
To better understand my arguments, not so long ago, I’ve written a snake in C that uses only bitwise operations and squeezes everything into only a handful of uint32_t
and uint64_t
variables. The results (after macro expansions) are not that readable, even for an initiated eye.
In any case, this article is not about why we shouldn’t ever touch them; on the contrary, it is about why they are cool and how they can make specific code snippets orders of magnitude faster than the “higher-level-readable-modern approach”. If you are a programmer who enjoys competitive programming, knowing bitwise operations (in case you don’t know about them already) will help you write more efficient code.
Again, knowing how to deal with bitwise operations is necessary if you plan a career in systems programming, network programming or embedded software development.
nth
bit
Nature gifted humankind ten fingers. As a direct consequence of Nature’s decision, our Math (and numbers) are almost always expressed in base 10. If an alien specie (with eight fingers) discovers mathematics, they will probably use base 8 (octal) to represent their numbers. Meanwhile, computers love base 2 (binary) because computers have only two fingers: 1 and 0, or one and none.
The Mayan numeral system was the system to represent numbers and calendar dates in the Maya civilization. It was a vigesimal (base-20) positional numeral system.
In mathematics, a base refers to the number of distinct symbols we use to represent and store numbers.
In our case (decimal), those symbols are 0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, and 9
. We must “recombine” the existing symbols to express more significant numbers. For example, 127
is defined by re-using 1
, 2
, and 7
. The three symbols are combined to express a greater quantity that cannot be described using mere fingers.
By far, the most popular number system bases are:
Number System | Base | Symbols |
---|---|---|
Binary | 2 |
[0 , 1 ] |
Octal | 8 |
[0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ] |
Decimal | 10 |
[0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] |
Hexadecimal | 16 |
[0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , A , B , C , D , E , F ] |
To make things more generic, if \(b\) is the base, to write the number natural number \(a\) in base \(b\) (notation is \(a_{b}\)), then the formula is: \(a_{b}=a_{0}*b^{0}+a_{1}*b^{1}+a_{2}*b^{2}+...+a_{n}*b^{n}\), where \(a_{n}\), \(a_{n-1}\), …, \(a_{2}\), \(a_{1}\), \(a_{0}\) are the symbols in descending order, and \(a_{i} \lt b\).
For example, 1078
in base 10
(\(b=10\), so \(a_{i} \in \{0,1,2,3,4,5,6,7,8,9\}\)) can be written as:
If we were to change the base and write 1078
from base 10
to base 7
, then \(b=7\) and \(a_{i} \in \{0,1,2,3,4,5,6\}\) (we only have seven fingers numerotated from 0
to 6
):
If we are to change the base and write 1078
from base 10
to base 2
, then \(b=2\) and \(a_{i} \in \{0,1\}\):
As we’ve said earlier, computer store numbers in binary, so better visualise how our memory looks like, let’s take a look at the following diagram:
As you can see, to identify the bits (the sequence of zeroes and ones which are the acceptable symbols in binary) comprising the number, we have to find an algorithm to determine the numbers \(a_{i}\). Luckily, such an algorithm exists, and it’s straightforward to implement. It works the same, no matter what base we pick.
Based on the above picture, another important observation is that to represent the number 1078 in binary, we need at least ten memory cells (bits) for it (look at the most significant power of 2 used, which is 10). As a side rule, the fewer symbols we have for our base, the more we have to repeat existing symbols. If we want to go extreme and pick b=1
, we will have a Unary Numeral System, where representing a number N
is equivalent to repeating the unique symbol of the system N
times.
The algorithm for transitioning a number to any base \(b\) is as follows:
0
at some point, we stop.The base \(b\) representation of the decimal number will be the sequence of remainders (in reverse order).
For example, let’s convert 35 to base 2:
After applying the algorithm, \(35_{10}=100011_{2}\). It’s easy to test if things are correct. We take each bit and multiply its corresponding power of b=2
: \(35_{10}=1*2^{5}+0*2^{4}+0*2^{3}+0*2^{2}+1*2^{1}+1*2^{0}\).
Converting a number from the decimal system to the hexadecimal number system is a little bit trickier; the algorithm remains the same, but because the hexadecimal system has 16 symbols, and we only have ten digits (0
, 1
,…, 9
) we need to add additional characters to our set, the letters from A
to F
. A
corresponds to 10, B
corresponds to 11
, C
corresponds to 12
, D
corresponds to 13
, E
corresponds to 14
, and F
corresponds to 15
.
For example, let’s convert 439784
to hexadecimal to see how it looks:
As you can see, \(439784_{10}=6B5E8_{16}\). Another popular notation for hexadecimal numbers is 0x6B5E8
; you will see the 0x
prefix before the number. Similarly, for binary, there’s the 0b
prefix before the actual number representation (C doesn’t support this).
Because numbers in the binary numerical system take so much “space” to be represented, you will rarely see them printed in binary, but you will see them in hexadecimal.
Personally, when I have to “translate” from binary to hexadecimal, and vice-versa, I don’t apply any “mathematical” algorithm. There’s a simple “visual” correspondence we can use:
As you can see, each symbol from the hexadecimal format can be represented as a sequence of 4 bits in binary. 8
is 1000
, E
is 1110
, and so on… When you concatenate everything, you have the binary representation of the number from hexadecimal to binary. The reverse operation also works. With a little bit of experience (no pun intended), you can do the “transformation” in your head and become one of the guys from The Matrix.
If you don’t have experience with the hexadecimal number systems, write the digits on a piece of paper a few times until you memorise them:
Numbers, especially when represented in binary, have a certain symmetry associated with them. The most obvious pattern is the way odd and even numbers have to alternate 0
and 1
as their last (least significant) bit:
There’s no magic; this is the way numbers operate. If we move one column to the left (the bits corresponding to \(2^1\)), you will see every two (\(2^1\)) bits alternating: 00
alternates with 11
.
If we move one more column to the left (to the bits corresponding to \(2^2\)), you will see every four bits (\(2^2\)) alternating: 0000
alternates with 1111
.
If we move just another column to the left (to the bits corresponding to \(2^3\)), you will see every eight bits (\(2^3\)) alternating: 00000000
alternates with 11111111
.
Another interesting way to look at the numbers in binary is to “cut” their representation in half and observe a “mirroring” effect:
If we were to use our imagination, we could even fold the “bit surface”; we would get only a “surface” of 1
bits, as the upper chunk will fill up the gaps in the lower one:
Another interesting pattern is looking at a “ladder” forming up, where each step is double the size of the previous one (look at the green line from the image below):
“The ladder” changes its step whenever it encounters a power of two. Also, if you look closer, every power of two has only one bit of 1
at the power’s position in the number.
The C programming language provides numeric data types to store numbers in the computer’s memory. As previously mentioned, they are stored in binary (as a sequence of zeroes and ones). I am sure you’ve heard about char
, int
, unsigned int
, long
, long long
, float
, etc. If you want to revamp your knowledge in this area, I guess this Wikipedia article is more than enough. The biggest problem with the “classic” types was that their size could differ from one machine to another.
For example, char
is defined in the standard as an integer type (that can be signed or unsigned) that contains CHAR_BIT
bits of information. On most machines, CHAR_BIT
is 8
, but there were machines where for reasons beyond the scope of this article, CHAR_BIT
was 7
. Working on the bits of a char
and assuming they are 8
(99.99% of the cases) would create portability problems on the much fewer systems where CHAR_BIT
is 7
. (Note: CHAR_BIT
is a macro defined in limits.h
)
The same goes for the typical int
. In the C standard, int
doesn’t have a fixed size in terms of the bits it contains, only a lower bound, meaning it should be a least 16
bits long; on my machine is 32
, so again, portability issues are in sight.
With C99, new fixed-length data types were introduced to increase the portability of the software we write. They can be found in the header file inttypes.h
(and in stdint.h
). Those are the types I prefer to use nowadays when I write C code:
int8_t
: signed integer with 8 bits;int16_t
: signed integer with 16 bits;int32_t
: signed integer with 32 bits;int64_t
: signed integer with 64 bits;For each intN_t
signed integer, there is also an uintN_t
counterpart (unsigned integer, N=8,16,32,64
). For this reason, we will use the fixed-size integers from stdint.h
in our code.
Letting signed integers aside for a moment (as we will discuss later how negative numbers are represented), if we were to visually represent uint8_t
, uint16_t
and uint32_t
(skipping uint64_t
), they look like this:
The maximum value an uint8_t
variable can take is when all its bits are set to 1:
To determine the maximum unsigned integer we can hold in a variable of type uint8_t
, we add all the powers of two like this:
Or, we can use this formula: \(\sum_{i=0}^{n} 2^i =2^{n+1}-1\), so for each uintN_t
we can up with this table:
Unsigned Fixed Type | Maximum Value | C Macro |
---|---|---|
uint8_t |
2^{8}-1=255 | UINT8_MAX |
uint16_t |
2^{16}-1=65535 | UINT16_MAX |
uint32_t |
2^{32}-1=4294967295 | UINT32_MAX |
uint64_t |
2^{64}-1=18446744073709551615 | UINT64_MAX |
Yes, you’ve read well; there are also macros for all the maximum values. When you are programming, you don’t have to compute anything; it will be a waste of CPU time to redo the math all over again. So everything is stored as macro constants (if such a thing exists):
#include <stdio.h>
#include <stdint.h> // macros are included here
int main(void) {
printf("%hhu\n", UINT8_MAX);
printf("%hu\n", UINT16_MAX);
printf("%u\n", UINT32_MAX);
printf("%llu\n", UINT64_MAX);
return 0;
}
Output:
255
65535
4294967295
18446744073709551615
In the code section, there’s one slight incovenience, %hhu
, %hu
, %u
, etc. are not the right formats for the fixed-length types. To the right fornats are defined in inttypes.h
as macros:
#include <stdio.h>
#include <stdint.h> // macros are included here
int main(void) {
printf("%+"PRIu8"\n", UINT8_MAX);
printf("%+"PRIu16"\n", UINT16_MAX);
printf("%+"PRIu32"\n", UINT32_MAX);
printf("%+"PRIu64"\n", UINT64_MAX);
return 0;
}
Funnily enough on clang
I get warnings for using those formats, while on gcc
everything compiles just fine, without any warnings:
bits.c:300:26: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
printf("%"PRIu8"\n", UINT8_MAX);
~~~~~~~ ^~~~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdint.h:107:27: note: expanded from macro 'UINT8_MAX'
#define UINT8_MAX 255
^~~
bits.c:301:27: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat]
printf("%"PRIu16"\n", UINT16_MAX);
~~~~~~~~ ^~~~~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdint.h:108:27: note: expanded from macro 'UINT16_MAX'
#define UINT16_MAX 65535
For this exercise, we will write a C function that takes an uint16_t
and prints its representation in other numerical systems to stdout
.
For everything bigger than base 10
, we will use the letters from the alphabet. If the base is bigger than 36
(10
digits + 26
letters), we will print an error to the stderr
. We will start by defining an “alphabet” of symbols that map every number from 0..35
to the digits and letters that we have available:
#define MAX_BASE 36
char symbols[MAX_BASE] = {
'0', '1', '2', '3', '4', '5', '6', '7', '8',
'9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q',
'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'
};
// For 0, symbols[0] = '0'
// ...
// For 11, symbol[11] = 'B'
// ...
// For 35, symbol[35] = 'Z'
The next step is to write a function that implements the basic algorithm described in the first section of the article.
#define MAX_BASE 36
char symbols[MAX_BASE] = { /** numbers and letters here */ };
void print_base_iter1(uint16_t n, uint8_t base) {
// Sanity check
if (base >= MAX_BASE){
fprintf(stderr, "Base %d is bigger than the possible accepted base", base);
return;
}
uint16_t r;
while (n>0) { // While quotient is bigger than 0
r = n % base; // Compute the remainder
n /= base; // Divide with base again
fprintf(stdout, "%c", symbols[r]); // Print the symbol
// Associated with the remainder
}
}
Everything looks good, but if we run the function, we will see a slight inconvenience; the symbols are printed in the reverse order we want them to be. For example, calling: print_base_iter1(1078, 2);
will yield the result: 01101100001
, which is technically correct, but only if we read the number from right to left or use a mirror. Jokes aside, the correct answer is 10000110110
.
Now let’s try to convert a number from decimal to hexadecimal to see some letters by printing print_base_iter1(44008, 16);
, the result given by our function is 8EBA
, again if we read it from right to left, it’s “the excellent” result.
To fix this inconvenience, we can write the results in an intermediary, char*
(string), to control the order in which we show the characters. Or we can use a Stack data structure, where we push the remainders individually and then print them while we pop them out.
Another simpler solution is to use recursion + the only stack real programmers use (that was a joke!):
#define MAX_BASE 36
char symbols[MAX_BASE] = { /** */ };
static void print_base_rec0(uint16_t n, uint8_t base, uint16_t rem) {
if (n>0) {
uint16_t r=n%base;
print_base_rec0(n/base, base, r); // calls the method again
// printing the character from the next line
// doesn't happen until the previous call to
// the method is finished
fprintf(stdout, "%c",symbols[r]);
}
}
void print_base_rec(uint16_t n, uint8_t base) {
if (base>=MAX_BASE) {
fprintf(stderr, "Base %d is bigger than the possible accepted base", base);
return;
}
print_base_rec0(n, base, 0);
}
To simplify things, C supports hexadecimal literals (but not binary!), so we can assign numbers in hexadecimal to variables. In C, a hexadecimal literal is written with the prefix 0x
(or 0X
) followed by one or more hexadecimal symbols (digits). Both uppercase and lowercase work.
For example, we can write:
uint32_t x = 0x3Fu; // 0x3F is 63
// another way of writing:
//
// uint32_t x = 63
uint32_t y = 0xABCDu; // 0xABCD is 43981
// another way of writing:
//
// uint32_t x = 43981
We can also print the hexadecimal representation of a number using "%X"
(uppercase letters) or "%x"
(lowercase letters) as the format specifier:
int main(void) {
printf("%x\n", 63);
printf("%X\n", 43981);
return 0;
}
// Output;
// 3f
// ABCD
Hexadecimal literals allow us to insert easter eggs in our code base. For example, this simple line can act as a warning for developers just about to join your project:
printf("%x %x %x %x\n", 64206, 10, 2989, 49374);
// Output:
// face a bad c0de
Unfortunately, in C, there’s no binary literal…
Bitwise operations are mathematical operations that manipulate the individual bits of a number (or set of numbers). As the name suggests, they work on bits.
The operations are:
Symbol | Operation |
---|---|
& |
bitwise AND |
| |
bitwise OR |
^ |
bitwise XOR |
~ |
bitiwse NOT |
Additionally, you have two more operations to shift bits (right or left) inside a number.
Symbols | Operation |
---|---|
<< |
left SHIFT |
>> |
right SHIFT |
If we apply one of the binary operators on two numbers, A
and B
, we obtain a third number , C
, where C = A OP B
.
If \(a_{7}, a_{6}, ..., a_{0}\) are the bits composing A
, \(b_{7}, b_{6}, ..., b_{0}\) are the bits composing B
, and \(c_{7}, c_{6}, ..., c_{0}\) are bits composing C
, then we can say that:
In the C programming language, the bitwise AND
operator, denoted as &
(not to be confused with &&
), is a binary operator that operates on two integer operands and returns an integer operand. The operation is performed for each pair of corresponding bits of the operands. The result is a new integer in which each bit position is set to 1
only if the corresponding bits of both operands are also 1
. Otherwise, the result bit is set to 0
.
Let’s give it a try in code:
uint8_t a = 0x0Au, b = 0x0Bu;
printf("%x", a&b);
// Output
// a
Explanation: 0x0A
is 0b00001010
, while 0x0B
is 0b00001011
, and if we were to put bits side by side and apply &
between them, we would get the following:
As you can see, only the 1
bits are selected, so the result is 0x0A
.
Trying to apply bitwise operations to double
or float
types won’t work:
double a = 0.0;
printf("%x", a&1);
Error:
bits.c:120:19: error: invalid operands to binary & (have 'double' and 'double')
120 | printf("%x", a&1.0);
| ^
One thing to take into consideration is the fact that &
is both associative and commutative.
The associative property means that the grouping of operands does not affect the result. So, if we have three or more operands, we can group them in any way we choose, but the result will remain the same:
// Associativity "smoke test"
uint8_t a=0x0Au, b=0x30u, c=0x4Fu;
printf("%s\n", (((a&b)&c) == (a&(b&c))) ? "True" : "False");
// Output:
// True
Visually it’s quite an intuitive property, so let’s put a=0x0A
, b=0x30
, and c=0x4f
side by side and see what things look like:
No matter how we group the operands, the results will always be the same: 0x00
because there’s no column containing only 1
bits. A single 0
in a column invalidates everything. Isn’t this demotivational?
The commutative property means that the order of operands doesn’t affect the result. So, for example, writing a&b
renders the same result as writing b&a
.
// Commutativity "smoke test"
uint8_t a=0x0Au, b=0x30u;
printf("%s\n", ((a&b)==(b&a)) ? "True" : "False");
// Output:
// True
The bitwise OR
(with its symbol: |
) is a binary operator that compares the corresponding bits of two integer operands a produces a new value in which each bit is set to 1 if either (or both) of the corresponding bits in the operand are one.
Again, let’s try using |
in our code:
uint8_t a = 0xAAu, b=0x03u;
printf("%x", a|b);
// Output
// AB
Explanation 0xAA
is 0b10101010
, while 0x03
is 0b00000011
. If you put the two numbers side by side and apply |
to their bits, we get the result 0xAB
. Visually, things look like this:
If we look at the columns, and there’s at least one bit of 1
, the result on that column will be 1
, regardless of the possible 0
(zeroes).
Just like &
, |
is both associative and commutative. Demonstrating this is outside the scope of this article, but put it like this, if there’s a single 1
on the column, no matter how many 0
bits we may encounter, the result will always be 1
. A single 1
has the power to change everything. Isn’t this motivational?
The bitwise XOR
operator (^
) is a binary operator that compares the corresponding bits of two operands and returns a new value where each bit is set to 1
if the corresponding bits of the operand are different, and 0
if they are the same.
Two identical numbers, a
and b
, will always XOR
to 0
because all the matching bits will nullify themselves.
So, if a==b
then a^b==0
:
uint8_t a = 0xAFu, b=0xAFu;
printf("a==b is %s\n", (a==b) ? "True" : "False");
printf("a^b=0x%x\n", a^b);
// Output
// a==b is True
// a^b=0x0
Because we like patterns, we can also use 0xAA ^ 0x55 == 0xFF
, visually it’s more satisfying than any other example I could think of:
Like &
and |
before, ^
is an associative and commutative operation. So, another useless but interesting observation we can make is that XOR
ing all the numbers in a loop up to a power of two (>=2
) is always 0
. Philosophically speaking, XOR
is the killer of symmetry:
void xoring_power_two() {
// An array containing a few powers of 2
uint8_t pof2[4] = {4, 8, 16, 32};
// For each power of two
for(int i = 0; i < 4; i++) {
uint8_t xored = 0;
// XOR all numbers < the current power of two
for(int j = 0; j < pof2[i]; j++) {
printf(" 0x%x %c", j, (j!=(pof2[i]-1)) ? '^' : 0);
xored^=j;
}
// Print the final result `= xored`
printf("= 0x%x\n", xored);
}
}
// Output
// 0x0 ^ 0x1 ^ 0x2 ^ 0x3 = 0x0
// 0x0 ^ 0x1 ^ 0x2 ^ 0x3 ^ 0x4 ^ 0x5 ^ 0x6 ^ 0x7 = 0x0
// 0x0 ^ 0x1 ^ 0x2 ^ 0x3 ^ 0x4 ^ 0x5 ^ 0x6 ^ 0x7 ^ 0x8 ^ 0x9 ^ 0xa ^ 0xb ^ 0xc ^ 0xd ^ 0xe ^ 0xf = 0x0
// 0x0 ^ 0x1 ^ 0x2 ^ 0x3 ^ 0x4 ^ 0x5 ^ 0x6 ^ 0x7 ^ 0x8 ^ 0x9 ^ 0xa ^ 0xb ^ 0xc ^ 0xd ^ 0xe ^ 0xf ^ 0x10 ^ 0x11 ^ 0x12 ^ 0x13 ^ 0x14 ^ 0x15 ^ 0x16 ^ 0x17 ^ 0x18 ^ 0x19 ^ 0x1a ^ 0x1b ^ 0x1c ^ 0x1d ^ 0x1e ^ 0x1f = 0x0
If we picture this in our heads, this result is not surprising:
Every bit has a pair; all pairs are 0
, XOR
ing 0
with 0
is 0
, and everything reduces to nothing.
In the C Programming language, the bitwise NOT
it’s a unary operator denoted by the ~
character. It works on a single operand, negating all the operand bits by changing 1
to 0
and 0
to 1
.
Negating 0b0001
is 0b1110
, negating 0b0000
is 0b1111
and so on…
For example:
uint16_t a = 0xAAAAu; // a = 1010 1010 1010 1010 == 0xAAAA
uint16_t b = ~a; // b = 0101 0101 0101 0101 == 0x5555
printf("0x%X\n", a);
printf("0x%X\n", b);
// Output
// 0xAAAA
// 0x5555
And visually things look like this:
The left shift operation is a bitwise operation, denoted with the symbols <<
, that shifts the bits of a binary number to the left by a specified amount of positions.
So, for example, if we want to shift the bits of 0b0000010
(or 0x02
) with three positions, we can write something like this:
uint8_t a = 0x02u; // 0b 0000 0010 = 0x02
uint8_t b = a << 3; // 0b 0001 0000 = 0x10
printf("After shift: 0x%x\n", b);
// Output
// After shift: 0x10
Visually things look like this:
When shifting bits to the left, the bits that “fall off” the left end are lost, and the resulting bits on the right are filled with zeros.
The right shift operation is a bitwise operation, denoted with the symbols >>
, that shifts the bits of a binary number to the right by a given amount of positions.
So for example, if we want to shift 0xAA
with 4
positions, by performing 0xAA>>4
we will obtain 0x0A
:
uint8_t a = 0xAAu; // 1010 1010 = 0xAA
uint8_t b = a >> 4; // 0000 1010 - 0x0A
printf("After shift: 0x%X\n", b);
// Output
// After shift: 0xA
Visually things look like this:
Reading through this point, you may feel an elephant lurking in the server room. We haven’t touched on a vital subject: How are signed integers represented in binary?
In the C programming language, fixed-size signed integers are represented using Two’s Complement. The most significant bit of the number (also called MSB
) is used to the sign, and the rest of the bits are used to store the number’s value. The sign bit is 0
for positive numbers and 1
for negative numbers. By convention, the number 0
is considered a positive number.
In Two’s Complement, the negative number is obtained by flipping all the bits of the positive value (~
) of the number and then adding 1
.
For example, to obtain the binary representation of -47
, we should do the following:
47
in binary: 00101111
;47
: 11010000
;1
to the result of the previous step: 11010001
.So, -47
in binary is 11010001
.
Another example. To obtain the binary representation of -36
, we should do the following:
36
in binary: 00100100
;36
: 11011011
;1
to the result from the previous step: 11011100
.There’s one bit less to represent the actual numerical value for signed integers because the sign bit is reserved. The maximum positive value a int8_t
can hold is: \(2^7-1=127\) and has the following representation:
The minimum value a signed integer of type int8_t
can hold is \(-2^7=128\). At this point, you may wonder why is that for negative we have -128
vs 127
for positives. This happens because 0
is considered to be positive by convention. -128
has the following representation:
You don’t have to do any computations to determine the maximum and minimum values a signed fixed-length type can get. The limits are already defined as macro constants in "stdint.h"
:
#include <stdio.h>
#include <stdint.h> // constant macros are included here
int main(void) {
printf("int8_t is in interval: [%hhd, %hhd]\n", INT8_MIN, INT8_MAX);
printf("int16_t is in interval: [%hd, %hd]\n", INT16_MIN, INT16_MAX);
printf("int32_t is in interval: [%d, %d]\n", INT32_MIN, INT32_MAX);
printf("int64_t is in interval: [%lld, %lld]\n", INT64_MIN, INT64_MAX);
return 0;
}
// Output
// int8_t is in the interval: [-128, 127]
// int16_t is in the interval: [-32768, 32767]
// int32_t is in the interval: [-2147483648, 2147483647]
// int64_t is in the interval: [-9223372036854775808, 9223372036854775807]
Just like in the previous example, it’s advisable to use the right string formats for the family of fixed-lenght signed integers: PRId8
, PRId16
, etc.
In the C programming language, UB (a cute acronym for Undefined Behavior) refers to situations (usually corner cases, but not always) when the C Standard does not cover the expected result after executing a piece of code. In those cases, compilers can choose to do things their way by crashing, giving erroneous or platform-dependent results (worse than crashing), or trolling us with Heisenbugs. Most cases of UB are ubiquitous, while others are more subtle and hard to detect.
In the C community, undefined behaviour may be humorously called “nasal demons” after a comp.std.c post that explained undefined behaviour as allowing the compiler to do anything it chooses, even “to make demons fly out of your nose”. (source)
Just like managing the memory yourself, there are a few things that you should take into consideration when writing C code that uses bitwise operations:
A. Do not shift bits with more (or equal) than the width of the type:
uint8_t a = 32;
uint8_t b = a << 32; // undefined behavior
// code compiles just fine
// don't assume the number is 0x0
If you try to compile this simple piece of code with -Wall
, the compiler (both clang
and gcc
) will you warn about the potential problems, but the code will still compile:
bits.c:150:19: warning: left shift count >= width of type [-Wshift-count-overflow]
150 | uint8_t b = a << 32; // undefined behavior
If I execute the code after compiling it, b
is 0
. But don’t assume it will be 0
on all platforms or with all compilers. That’s wrong.
Also, don’t rely on compiler warnings. They can only be raised in particular cases. Take a look at this code that can lead to UB:
srand(time(NULL));
uint8_t a = 32;
int shifter = rand();
uint8_t b = a << shifter;
printf("%hhu\n", b);
The code compiles without any warning and executes fine. The compiler couldn’t determine the value of the shifter
at compile time, so no warning
was raised. So whenever you are performing bitwise operations (especially shifts), you’d better know what you are doing.
B. Do not shift bits using negative amounts:
uint8_t a = 0x1u;
uint8_t b = a << -2; // undefined behavior
// code compiles just fine
C.Do not shift signed integers to cause sign changes:
int8_t a = -1;
int8_t b = a << 1; // undefined behavior
// code compiles just fine
The result of E1 « E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Mandatory Computer Science Exercise - The solitary integer
Now that we understand the basics of bitwise operations let’s solve a classical Computer Science exercise called: The solitary integer. If you are curious, you can probably find it on leetcode and hackerrank (under the name The Lonely Integer).
The ask is straightforward:
Given an array of integer values, where all elements **but one** occur twice,
find the unique element, the so-called _solitary_ integer.
For example, if `L={1,2,3,3,8,1,9,2,9}`, the unique element is `8`,
because the rest of the elements come in pairs.
The first reflex to solve this exercise would be to brute force the solution by verifying each element with every other to find its pair. But the complexity of doing so is O(n^{2}), where n is the size of the input array - not excellent. But, as a rule of thumb, if you receive a question like this at an interview and don’t know how to approach it, mentioning the brute-force solution is a good starting point and can buy you some time until you come up with something better.
There are, of course, other alternative solutions:
O(nlogn)
and then iterate through it by i+=2
. If L[i]!=L[i+1]
, you’ve just found the lonely integer.1
, the problem is solved; you’ve just found the lonely integer.But all those solutions are slightly overkill if you know about XOR
. As we’ve said earlier, XOR
nullifies identical bits. We also know that XOR
is associative and commutative. So, why don’t we apply XOR
between all the numbers? In the end, only the bits without pairs will “survive”. Those surviving bits hold the answer to our problem.
array[0]
will (eventually) nullify themselves with the bits from array[5]
=> array[0] ^ array[1] == 0
;array[1]
will (eventually) nullify themselves with the bits from array[7]
=> array[1] ^ array[7] == 0
;array[2]
will (eventually) nullify themselves with the bits from array[3]
=> array[2] ^ array[3] == 0
;array[6]
will (eventually) nullify themselves with the bits from array[8]
-> array[6] ^ array[8] == 0
;array[4]
will remain unaltered; they represent the solution;So, the solution of exercise becomes:
static int with_xor(int *array, size_t array_size) {
int result = 0;
for(int i = 0; i < array_size; ++i)
result^=array[i];
return result;
}
int main(void) {
int array[9] = {1,2,3,3,8,1,9,2,9};
printf("%d\n", with_xor(array, 9));
return 0;
}
// Output
// 8
In a previous section of the article, we devised a solution to transform numbers from one numeric system to another. Chances are we will never have to convert to base 11
. So why write a function that transforms a number in the binary format using bitwise operations?
The simplest solution I could think of is the following:
void print_bits_simple_rec(FILE *out, uint16_t n) {
if (n>>1)
print_bits_simple_rec(out, n>>1);
fprintf(out, "%d", n&0x1u);
}
print_bits_simple_rec
is a recursive function that takes an uint16_t
and it prints its bits. At each recursive call, we shrink the number by shifting one bit to the right (n>>1
). We stop the recursive calls once the number reaches 0
. After the recursive stack is built, we print the last bit of the number for each call (n&0x1
).
It’s not the scope of this article to explain recursion, but let’s see how things execute if we call the function on n=0b00011011
:
And then, once n=0b00000001
, we start printing characters backwards:
That’s one idea. Another idea is to use a value table where we keep the binary strings associated with all the binary numbers from 0
to 15
:
const char* bit_rep[16] = {
"0000", "0001", "0010", "0011",
"0100", "0101", "0110", "0111",
"1000", "1001", "1010", "1011",
"1100", "1101", "1110", "1111",
};
We can then write a few functions that re-use bit_rep
. For example, if we plan to print an uint8_t
, all we need to do is to write this function:
void print_bits_uint8_t(FILE *out, uint8_t n) {
fprintf(out, "%s%s", bit_rep[n >> 4], bit_rep[n & 0xF]);
}
int main(void) {
uint8_t n = 145;
print_bits_uint8_t(stdout, n);
}
// Output
// 10010001
This new function works like this:
uint8_t n
has 8 bits in total.n
into two halves of 4 bits each, we can use bit_rep[half1]
and bit_rep[half2]
to print the content of n
;n
into two halves, we have to:
n>>4
to get the first 4 bits;n & 0xF
to get the last 4 bits;If you are confused about n>>4
and n & 0xF
, let’s visualise what’s happening and how bits move. We will use n=145
to exemplify.
If we consider the following:
uint16_t
variable can contain two uint8_t
variables;uint32_t
variable can contain two uint16_t
variables;uint64_t
variable can contain two uint32_t
variables;We can then write the following code, where each function re-uses the function of the lesser type. The idea is the same: we split the greater type into two halves and pass it to the function associated with the lesser type.
void print_bits_uint8_t(FILE *out, uint8_t n) {
fprintf(out, "%s%s", bit_rep[n >> 4], bit_rep[n & 0xFu]);
}
void print_bits_uint16_t(FILE *out, uint16_t n) {
print_bits_uint8_t(out, n >> 8); // first 8 bits
print_bits_uint8_t(out, n & 0xFFu); // last 8 bits
}
void print_bits_uint32_t(FILE *out, uint32_t n) {
print_bits_uint16_t(out, n >> 16); // first 16 bits
print_bits_uint16_t(out, n & 0xFFFFu); // last 16 bits
}
void print_bits_uint64_t(FILE *out, uint64_t n) {
print_bits_uint32_t(out, n >> 32); // first 32 bits
print_bits_uint32_t(out, n & 0xFFFFFFFFu); // last 32 bits
}
Having separate functions for each type is not ideal, but prevalent in the C programming language. Fortunately, we can use the _Generic
macro to group functions up.
#define print_bits(where, n) _Generic((n), \
uint8_t: print_bits_uint8_t, \
int8_t: print_bits_uint8_t, \
uint16_t: print_bits_uint16_t, \
int16_t: print_bits_uint16_t, \
uint32_t: print_bits_uint32_t, \
int32_t: print_bits_uint32_t, \
uint64_t: print_bits_uint64_t, \
int64_t: print_bits_uint64_t) \
(where, n)
So now, we can simply call print_bits()
regarding the input type (! as long as the type is covered by a _Generic macro branch):
uint8_t a = 145;
uint16_t b = 1089;
uint32_t c = 30432;
int32_t d = 3232;
print_bits(stdout, a); printf("\n"); // works on uint8_t !
print_bits(stdout, b); printf("\n"); // works on uint16_t !
print_bits(stdout, c); printf("\n"); // works on uint32_t !
print_bits(stdout, d); printf("\n"); // works on int32_t !
// Output
// 10010001
// 0000010001000001
// 00000000000000000111011011100000
// 00000000000000000000110010100000
In low-level programming, bitwise masking involves the manipulation of individual bits of a number (represented in binary) using the operations we’ve described in the previous sections (&
, |
, ~
, ^
, >>
, <<
). A mask is a binary pattern that extracts and manipulates specific bits of a given value.
Using bitmasking techniques, we can:
0
or 1
);Let’s take a look at the previously defined function, print_bits_uint8_t
, that prints the binary representation of a uint8_t
:
void print_bits_uint8_t(FILE *out, uint8_t n) {
fprintf(out, "%s%s", bit_rep[n >> 4], bit_rep[n & 0xFu]);
}
0xF
is the mask we use to select the last 4
bits of n
. This happens when we apply n&0xF
: all the bits of 1
from the mask
are used to extract information from n
, and all the bits of 0
from the mask discard information from n
:
When we create the mask, we can write the pattern by hand, using hexadecimal literals, or we can express them using powers of 2
. For example, if you want a simple mask for one bit on the nth
position, we can write: 1<<nth
. 2^nth == 1<<nth
:
We can also “flip” the mask using the ~(mask)
operation:
To get a “contiguous” zone of 1
s, we can subtract 1
from the corresponding power of twos:
In the famous book called Cracking The Coding Interview there’s one exercise where the reader is asked to swap the even bits with the odd bits inside a number:
If you ignore the ask to use as few instructions as possible our programmer’s reflex would be to:
Of course, a simpler solution uses bitwise operations and the masking technique. Spoiler alert, we will start with the actual solution, followed by some in-depth explanations:
uint16_t pairwise_swap(uint16_t n) {
return ((n&0xAAAAu)>>1) | ((n&0x5555u)<<1);
}
Cryptic, but simple:
uint16_t n = 0xBCDDu;
uint16_t n_ps = pairwise_swap(n);
print_bits(stdout, n); printf("\n");
print_bits(stdout, n_ps); printf("\n");
// Output
// 1011110011011101
// 0111110011101110
The key to understanding the solution lies in the patterns described by the binary numbers 0xAAAA
and 0x5555
. 0xAAAA
selects all the even bits of n
, while 0x5555
selects all the odd bits of n
. If we put the numbers side by side, we should see that:
At this point, the information initially contained in the input number (n=0xBCDD
) is contained in the two numbers:
0xBCDD & 0x5555
will contain the odd bits of 0xBCDD
;0xBCDD & 0xAAAA
will contain the even bits of 0xBCDD
;Now we need to swap them. We will shift the even bits one position to the left, and the odd bits one place to the right, so we don’t lose any. To recombine the two interlacing patterns back, we use the |
operation.
nth
bitnth
bit of a numberTo get the nth
bit of a number n
, we can use the &
and >>
bitwise operations:
>>
with nth
positions;&0x1
for obtaining the last bit;Most online resources (ChatGPT included) will recommend you the following two solutions for retrieving the nth
bit:
A macro:
#define GET_BIT(n,pos) (((n)>>(pos))&1)
Or a method:
int get_bit(int num, int n) {
return (num >> n) & 0x1u;
}
Visually, both solutions work like this:
I prefer using a method instead of a macro, depending on the context. It’s best to validate the input and ensure n
is not negative or bigger than the size (in bits) of num
. Otherwise, things can lead to UB fast:
inline uint8_t get_nth_bit(uint32_t num, uint32_t nth) {
if (nth>=32) {
// Catch error
// Log & Manage the error
}
return (num>>nth)&0x1u;
}
Let’s try it in practice now:
int main(void) {
uint32_t n = 0xFFu;
int i = 0;
printf("Printing last 8 bits:\n");
for(; i < 8; i++)
printf("%hhu", get_nth_bit(n,i));
printf("\nPriting the first 24 bits:\n");
for(; i < 32; i++)
printf("%hhu", get_nth_bit(n,i));
}
// Output
// Printing last 8 bits:
// 11111111
// Printing the first 24 bits:
// 000000000000000000000000
nth
bit of a numberThe bit of a number n
can be set to 0
or 1
, and depending on the context, we can end up having two functions or macros:
#define set_nth_bit1(num, pos) ((num) |= (1 << (pos)))
#define set_nth_bit0(num, pos) ((num) &= ~(1 << (pos)))
// Or functions
inline void set_nth_bit0(uint32_t *n, uint8_t nth) {
*n &= ~(1u << nth);
}
inline void set_nth_bit1(uint32_t *n, uint8_t nth) {
*n |= (1u << nth);
}
Because both of the functions (and macros) can lead to UB, it’s advisable to validate nth
to make sure it’s not bigger than the length (in bits) of the type of n
(in our case it’s uint32_t
, so it should be smaller <
than 32
).
Using the functions in code:
uint32_t n = 0x00FFu;
print_bits(stdout, n); printf("\n");
set_nth_bit0(&n, 5);
printf("bit 5 of n is: %hhu\n", get_nth_bit(n, 5));
print_bits(stdout, n); printf("\n");
set_nth_bit1(&n, 5);
printf("bit 5 of n is: %hhu\n", get_nth_bit(n, 5));
print_bits(stdout, n); printf("\n");
// Output
// 00000000000000000000000011111111
// bit 5 of n is: 0
// 00000000000000000000000011011111
// bit 5 of n is: 1
// 00000000000000000000000011111111
Visually, set_nth_bit0
looks like this:
Applying &
between 0
and 1
will always return 0
. So we create a mask for the 5th bit (1<<5
), we flip it (~(1<<5)
) so we get a 0
on the 5th position, and then we apply &
(bitwise AND
). The 1
doesn’t stand a chance.
Visually, set_nth_bit1
looks like this:
Applying |
between 0
and 1
returns 1
. So we create a mask for the 5th bit (1<<5
), then apply |
between the mask and the number to fix the gap.
nth
bit of a numberToggling the bit of the number means changing the value of a specific bit from 0
to 1
or from 1
to 0
while leaving all the other bits unchanged.
The first reflex would be to re-use the previously defined functions set_nth_bit1(...)
and set_nth_bit0(...)
to improvise something like:
void toggle_nth_bit(uint32_t *n, uint8_t nth) {
if (get_nth_bit(n,nth)==0) {
set_nth_bit1(n, nth);
} else {
set_nth_bit0(n, nth);
}
}
But there’s a better and simpler way that avoids branching altogether and uses XOR:
void toggle_nth_bit(uint32_t *n, uint8_t nth) {
*n ^= (1u<<nth);
}
The idea is quite simple; we create a mask with 1
on the nth
position (1<<nth
), and then we ^
(XOR
) the number n
with the mask
. This will preserve all the bits of n
, minus the nth
bit will change values depending on its state (it will toggle).
Let’s visualise this, by imagining calling: toggle_nth_bit(0xF0u, 3)
:
The result of toggle_nth_bit(0xF0u, 3)
should be 0xF8
:
uint32_t n = 0xF0u;
toggle_nth_bit(&n, 3);
if (n==0xF8u) {
printf("It works!");
}
// Output
// It works!
Or we perform the inverse operation on the same bit:
uint32_t n = 0xF8u;
toggle_nth_bit(&n, 3);
if (n==0xF0u) {
printf("It works!");
}
// Output
// It works!
So let’s say we have a number n
. Our task is to write a generic method that clears the last nbits
of that number.
The solution is simple:
1
, except the last nbits
, which are 0
.&
operation between n
and the newly created mask;To create the mask, we start from a value where all the bits are set to 1
. This value can be easily obtained by flipping all the bits of 0
: ~0x0u
. Next, we left shift with nbits
and, voila, the mask is ready.
The code is:
void clear_last_bits(uint32_t *n, uint8_t nbits) {
*n &= (~(0x0u) << nbits);
}
To test it, let’s try to clear the last 4 bits of 0xFF
. The result should be: 0xF0
:
uint32_t n = 0xFFu;
clear_last_bits(&n, 4);
if (n==0xF0u) {
printf("It works!\n");
}
// Output
// It works!
Visually, the operation looks like this:
In this case, the ask is simple; given an uint16_t
and two bit indices, i
and j
, we need to write a method that replaces all the bits between i
and j
(including j
) from N
with the value of M
. In other words, M
becomes a substring of N
that starts at i
and ends at j
.
The signature of the method should be the following:
void replace_bits(uint32_t *n, uint32_t m, uint8_t i, uint8_t j);
A simple solution that doesn’t impose any validation on i
, j
or m
can be:
void replace_bits(uint16_t *n, uint16_t m, uint8_t i, uint8_t j) {
// Creates a mask to clear the bits from i to j inside N
// The mask is made out of two parts that are stitched together using
// a bitwise OR
uint16_t mask = (~0x0u << (j+1)) | ((1<<i)-1);
// Clear the bits associated with the mask
*n &= mask;
// Align the bits to be replaced
m <<=i;
// Replace the bits from n with the value of m
*n |= m;
}
Executing the code:
uint16_t n = 0xDCBEu;
print_bits(stdout, n); printf("\n");
replace_bits(&n, 0x1u, 3, 6);
print_bits(stdout, n); printf("\n");
// Output
// 1101110010111110
// 1101110010001110
As you can see, the bits from positions 3
to 6
(inclusive) were replaced with the value of 0x1
, which is 0b001
in binary.
To understand what’s happening behind the scenes, we should go through the algorithm step by step.
Firstly we need to build a mask that selects the interval defined by i
and j
. The mask will be created by stitching together the two sections (using |
). The line of code where we create the mask
is: uint16_t mask = (~0x0u << (j+1)) | ((1<<i)-1);
. Visually it works like this:
The second step is to use the resulting mask
to clear the bits from i
to j
(including j
) inside n
: *n &= mask;
:
The third step is to shift the bits of m
with i
positions to the left to align them with the empty portion created by the mask
. And then use m
as a new mask: *n |= m;
:
Reading multiple bits (instead of replacing them) is a similar task to the one described above. We must write a method that works on an uint16_t
and two bit indices i
and j
. We need to extract and return the value of all the bits between i
and j
(including j
).
A proposed C function might look like this:
uint16_t get_bits(uint16_t input, uint8_t i, uint8_t j) {
uint16_t mask = (1u << (j - i + 1)) - 1;
mask <<= i;
return (input & mask) >> i;
}
Or, if we enjoy confusing our colleagues, we can try something like this:
uint16_t get_bits2(uint16_t input, uint8_t i, uint8_t j) {
uint16_t mask = (1u << (j + 1)) - (1 << i);
return (input & mask) >> i;
}
When we try them in practice, magic unfolds:
uint16_t n = 0xDCBE;
print_bits(stdout, n); printf("\n");
replace_bits(&n, 0x7u, 3, 6);
print_bits(stdout, n); printf("\n");
print_bits(stdout, get_bits(n, 3, 6)); printf("\n");
print_bits(stdout, get_bits2(n, 3, 6)); printf("\n");
// Output
// 1101110010111110
// 1101110010111110
// 0000000000000111
// 0000000000000111
It all boils down to how we’ve decided to implement the masking mechanisms:
uint16_t mask = (1u << (j - i + 1)) - 1; mask <<= i; // OR
uint16_t mask = (1u << (j + 1)) - (1u << i);
As you can see, in both versions (get_bits
and get_bits2
), we’ve decided to create the mask in one go without stitching together two sections as we did in replace_bits
.
Let’s take the first version to exemplify further. If we look closer, there’s no magic involved.
(1 << (j - i + 1)) -1
------------------
power of two -1
That’s a power of two from which we subtract 1
. We know the bit pattern associated with that kind of number (\(2^n-1\)):
So, visually speaking, the mask forms like this:
j-i+1
gives the length of the mask (the contiguous zone of 1
bits);mask <<= i
put 1
bits to the right position.There’s a strong relationship between bitwise operations and mathematical operations involving powers of two. This happening shouldn’t be any mystery or a surprise; after all, we use the powers of two to represent the number in the binary system.
Multiplying a number \(a\) with a power of two, \(a * 2^{n}\), is equivalent to writing a<<n
, shifting the bits of a
to the right with n
positions.
There’s a clear mathematical demonstration for this. If we go back to the beginning of the article and re-use the formula stated there, we know a number in the binary system can be written as a sum of the power of twos: \(A_{2} = \sum_{i=0}^{n} a_i * 2^i\), where \(a_{i} \in \{0, 1\}\). If we multiply both sides of the relationship with \(2^m\), the relationship becomes:
\[2^{m} * A_{2} = \sum_{i=0}^{n} a_i * 2^{i+m}\]We can intuitively understand that the first m
bits of information were lost, now when we sum; we don’t start with \(2^0\) anymore, but rather with \(2^{m}\), so that:
So, if we were to link the mathematical formula with what’s happening at the bit level, let’s take a look at the following picture:
Now let’s see if the compiler knows how to optimise the multiplication without explicitly telling it so. Given the following code:
int main(void) {
srand(0);
int i = rand();
for(; i < (1<<12); i*=4) {
printf("%d\n", 1);
}
return 0;
}
You can see that instead of writing i<<=2
in the loop, we’ve preferred to use the more readable i*=4
. If we compile the code (gcc -O3
for x86-64
) and look at the resulting assembly:
.LC0:
.string "%d\n"
main:
push rbx
xor edi, edi
call srand
call rand
cmp eax, 4095
jg .L2
mov ebx, eax
.L3:
mov esi, 1
mov edi, OFFSET FLAT:.LC0
xor eax, eax
; Shifts the value in ebx to the left by 2 bits (multiplication by 4)
sal ebx, 2
call printf
cmp ebx, 4095
jle .L3
.L2:
xor eax, eax
pop rbx
ret
You will see the compiler is smart enough to detect that one of the operands of i*=4
is a power of two, so it uses the equivalent >>
instruction, which is sal ebx, 2
, where:
sal
is an instruction that stands for shift arithmetic left;ebx
is the register where our i
value is kept;2
is the number of positions we shift to.Compilers can perform this optimisation for you, so you shouldn’t bother.
Dividing a number \(a\) with a power of two, \(a \div 2^{n}\), is equivalent to writing a>>n
. The mathematical demonstration is identical to the multiplication one so we won’t write it here.
But we can perform the following smoke test:
uint16_t a = 100u;
if (a/2 == a>>1) {
printf("Yes, we are right\n");
}
// Output
// Yes, we are right
Now let’s look at the following code:
int main(void) {
srand(NULL);
uint32_t n = rand(); // generates a random number
while(n!=0) { // while the number is different than 0
n /=2; // we divide it with 2
}
return 0;
}
If we compile the code with gcc -O1
(for x86-64
), the resulting assembly code is:
main:
sub rsp, 8
mov edi, 0
call srand
; Generate a random number and store the result in eax
call rand
; Test if the random number is 0 .
; If it is jump to .L2 otherwise continue
test eax, eax
je .L2
.L3:
; Copy the value of eax to edx
mov edx, eax
; Shift the value of eax to the right with 1 position
shr eax
; Compare the original in edx to 1 and jump back to .L3
cmp edx, 1
ja .L3
.L2:
mov eax, 0
add rsp, 8
ret
The important line here is shr eax
, where the compiler shifts the eax
one position to the right. Why did it do that? Our C code explicitly called division n /=2;
. Well, the compiler realised that the operand is 2
, and there’s no reason to use division instead of simple >>
.
Fun fact, if we rewrite the C code with the bitwise optimisation by replacing the line n/=2
with the line n>>=1
, the resulting assembly code will be identical. Compilers can perform this optimisation for you, so you should rarely bother with mundane optimisations like this.
Suppose we contemplate the following formula, where a number can be written as: \(A_{2} = \sum_{i=0}^{n} a_i * 2^i\), where \(a_{i} \in \{0, 1\}\), we will soon realise that: if we sum up powers of two in general (except \(2^{0}\)), \(A_{2}\) will always be even. A sum of even numbers is always even (we can use \(2\) as a common factor).
So the only indicator that gives the parity of the number is \(a_{0}*2^{0}\). \(a_{0}\) is the least significant bit, but in another way, it’s quite a critical fellow because it provides us with the answer to one crucial question: is the number even, or is it odd?
The rule is the following:
So to check the parity of a number is enough to mask it with 0x1
, and get the last bit:
uint16_t a = 15u;
uint16_t b = 16u;
printf("a=%d is %s\n", a, a&0x1u ? "odd" : "even");
printf("a=%d is %s\n", b, b&0x1u ? "odd" : "even");
// Output
// a=15 is odd
// a=16 is even
The modulo operation %
is slow no matter the hardware architecture we are using. So whenever we can replace it with something more efficient, it’s advisable, even if compilers can theoretically optimise things like this for you.
As a rule (a % (1<<n))
is equivalent to (a & ((1<<n)-1))
, where 1<<n
is the bitwise of saying \(2^n\).
If we go back to the formula, \(A_{2} = \sum_{i=0}^{n} a_i * 2^i\), where \(a_{i} \in \{0, 1\}\) and we divide into both sides with a power of two, \(2^m\), we will obtain:
\[\frac{A}{2^{m}} = a_{0} * \frac{1}{2^m} + a_{1} * \frac{1}{2^{m-1}} + a_2*\frac{1}{2^{m-2}} + ... + a_{n-1}*2^{n-m-1} + a_{n} * 2^{n-m}\]But at some point, the denominator of the fraction \(\frac{1}{2^{m-j}}\) will become negative again so that things will turn upside down yet again. This will happen when \(j \ge m\). So, for example, if m = 3
, we can write things like:
So to get the remainder, we need to select the last 3
bits (with the mask ((1<<3)-1)
) of the number:
uint16_t pow2 = 1u << 3;
for(int i = 1; i < 100; i++) {
printf("%2d mod %d=%d %c",
i, pow2, i & (pow2-1), i&0x7 ? ' ' : '\n');
}
// Output
// 1 mod 8=1 2 mod 8=2 3 mod 8=3 4 mod 8=4 5 mod 8=5 6 mod 8=6 7 mod 8=7 8 mod 8=0
// 9 mod 8=1 10 mod 8=2 11 mod 8=3 12 mod 8=4 13 mod 8=5 14 mod 8=6 15 mod 8=7 16 mod 8=0
// 17 mod 8=1 18 mod 8=2 19 mod 8=3 20 mod 8=4 21 mod 8=5 22 mod 8=6 23 mod 8=7 24 mod 8=0
// 25 mod 8=1 26 mod 8=2 27 mod 8=3 28 mod 8=4 29 mod 8=5 30 mod 8=6 31 mod 8=7 32 mod 8=0
// 33 mod 8=1 34 mod 8=2 35 mod 8=3 36 mod 8=4 37 mod 8=5 38 mod 8=6 39 mod 8=7 40 mod 8=0
// 41 mod 8=1 42 mod 8=2 43 mod 8=3 44 mod 8=4 45 mod 8=5 46 mod 8=6 47 mod 8=7 48 mod 8=0
// 49 mod 8=1 50 mod 8=2 51 mod 8=3 52 mod 8=4 53 mod 8=5 54 mod 8=6 55 mod 8=7 56 mod 8=0
// 57 mod 8=1 58 mod 8=2 59 mod 8=3 60 mod 8=4 61 mod 8=5 62 mod 8=6 63 mod 8=7 64 mod 8=0
// 65 mod 8=1 66 mod 8=2 67 mod 8=3 68 mod 8=4 69 mod 8=5 70 mod 8=6 71 mod 8=7 72 mod 8=0
// 73 mod 8=1 74 mod 8=2 75 mod 8=3 76 mod 8=4 77 mod 8=5 78 mod 8=6 79 mod 8=7 80 mod 8=0
// 81 mod 8=1 82 mod 8=2 83 mod 8=3 84 mod 8=4 85 mod 8=5 86 mod 8=6 87 mod 8=7 88 mod 8=0
// 89 mod 8=1 90 mod 8=2 91 mod 8=3 92 mod 8=4 93 mod 8=5 94 mod 8=6 95 mod 8=7 96 mod 8=0
// 97 mod 8=1 98 mod 8=2 99 mod 8=3
Without taking bitwise operations into consideration, our first reflex to check if a number is a power of two is to use logarithms. It’s not the best solution, and you will shortly see why:
#include <math.h>
int is_power_of_two(int num) {
// Negative numbers are not power of two
// 0 is not a power of two
if (num <= 0) {
return 0;
}
// We compute the logarithm
double log2num = log2(num);
// We check if the logarithm is an integer
return (log2num == floor(log2num));
}
Code looks fine, but it contains a dangerous comparison between log2num==floor(log2num)
. The reason it’s dangerous is that double
numbers cannot be represented with exact precision, errors by approximation can build up, and subtle differences can appear rendering the comparison useless.
If you don’t believe me, let’s try the following code:
double x = 10 + 0.1 + 0.2 + 0.2; // should be 10.5
double y = 11 - 0.2 - 0.2 - 0.1; // should be 10.5
printf("x and y are%sequal\n", x == y ? " " : " not ");
printf("the difference between the numbers is: %1.16f\n", x-y);
// Output
// x and y are not equal
// the difference between the numbers is: -0.0000000000000036
A disputed strategy of solving this is to introduce an epsilon (a very small value representing tolerance) and compare doubles by approximating equality. So instead of making the comparison (x==y
) directly, we can compare their difference with epsilon.
double epsilon = 0.000001;
if (fabs(x-y) <= epsilon) {
// the numbers are equal or almost equal
}
This doesn’t solve the problem by itself, but it can greatly reduce the number of errors we get. So why don’t we implement this the professional way. A simple bitwise trick that determines if a number is a power of two is to write a function like this:
bool is_pow2(uint16_t n) {
return (n & (n-1)) == 0;
}
And when we test it we saw that everything looks fine:
uint16_t a = 1u<<2,
b = 1u<<3,
c = (1u<<3) + 1;
printf("%hu is a power of two: %s.\n", a, is_pow2(a) ? "yes" : "no");
printf("%hu is a power of two: %s.\n", b, is_pow2(b) ? "yes" : "no");
printf("%hu is a power of two: %s.\n", c, is_pow2(c) ? "yes" : "no");
// Output
// 4 is a power of two: yes.
// 8 is a power of two: yes.
// 9 is a power of two: no.
Spoiler Alert the function has one subtle bug: it doesn’t behave correctly when n
is 0
. Let’s try it:
uint16_t a = 0x0u;
printf("%hu is a power of two: %s.\n", a, is_pow2(a) ? "yes" : "no");
// Output
// 0 is a power of two: yes.
Mathematicians will say: Raising any non-zero number to a natural power will never result in 0
. So our code should be re-written as such to consider this corner case:
bool is_pow2(uint16_t n) {
return n && !(n & (n-1));
}
Now that things are sorted let’s take a look and see why the function works. Firstly, we all know that a number which is a power of two, in its binary representation has exactly one bit of 1
on the power’s column:
When we subtract 1
from a power of two, the bit pattern looks like this:
So if we put those two pictures side by side, we should see how things are going:
We can see that all the bits nullify themselves when we apply &
. If only one bit would be different (when the number is not a power of two), this trick won’t work.
There are cases in code when, given a number n
, you want to determine the first power of two that is greater than n
. For example, if n=7
, the next power of two bigger than n
is 8
. Or, if n=13
, the next power of two bigger than 13
is 16
.
The programmer’s reflex would be to write a function like:
uint32_t next_power_of_two_naive(uint32_t n) {
uint32_t r = 1u;
while(r<x)
r*=2; // or r<<=1
return r;
}
Code works, but it’s prone to errors:
uint32_t n1=0u, n2=128u, n3=7u, n4=UINT32_MAX;
printf("next power of two for %u is %u\n", n1, next_power_of_two_naive(n1));
printf("next power of two for %u is %u\n", n2, next_power_of_two_naive(n2));
printf("next power of two for %u is %u\n", n3, next_power_of_two_naive(n3));
printf("next power of two for %u is %u\n", n4, next_power_of_two_naive(n4));
// Output
// next power of two for 0 is 1
// next power of two for 128 is 128
// next power of two for 7 is 8
// ^C <--- HAD TO CLOSE THE PROGRAM, INFINITE LOOP
Let’s abandon this solution and try to make use of bitwise operations. The new code should be like this:
uint32_t next_power_of_two(uint32_t n) {
n--;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n++;
return n;
}
Does it work better ?
uint32_t n1=0u, n2=128u, n3=7u, n4=UINT32_MAX;
printf("next power of two for %u is %u\n", n1, next_power_of_two(n1));
printf("next power of two for %u is %u\n", n2, next_power_of_two(n2));
printf("next power of two for %u is %u\n", n3, next_power_of_two(n3));
printf("next power of two for %u is %u\n", n4, next_power_of_two(n4));
// Output
// next power of two for 0 is 0
// next power of two for 128 is 128
// next power of two for 7 is 8
// next power of two for 4294967295 is 0
Well, at least the code doesn’t enter an infinite loop when n=UINT32_MAX
, but returns erroneous results for 0
. So we can definitely change something to it:
uint32_t next_power_of_two(uint32_t n) {
if (n==0) return 1; // takes care of the special case
n--;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n++;
return n;
}
Now, we should also do something when the numbers are getting closer to UINT32_MAX
. As you probably know, UINT32_MAX
is not a power of two (it’s actually (1<<32)-1
), so searching for the next power of two, after 1<<31
, doesn’t make any sense. If we let the function in the current form:
uint32_t n1=(1u<<31)+1,
n2=(1u<<31)+2,
n3=(1u<<31)+3;
printf("next power of two for %u is %u\n", n1, next_power_of_two(n1));
printf("next power of two for %u is %u\n", n2, next_power_of_two(n2));
printf("next power of two for %u is %u\n", n3, next_power_of_two(n3));
// Output
// next power of two for 2147483649 is 0
// next power of two for 2147483650 is 0
// next power of two for 2147483651 is 0
All the results will be 0
. So we should branch the function again and decide on what we’re going to do when n>(1<<31)
.
But now let’s get back to this magic, and see what’s happening:
n--;
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n++;
Let’s assume n=0x4000A8CC
(or n=1073785036
). Calling next_power_of_two(0x4000A8CC)
will return 0x80000000
:
n = 01000000000000001010100011001100 (0x4000A8CC)
n-- = 01000000000000001010100011001011 (0x4000A8CB)
n>>1 = 00100000000000000101010001100101
n = 01000000000000001010100011001011
n|(n>>1) = 01100000000000001111110011101111
-->1s 1s<--
n>>2 = 00011000000000000011111100111011
n = 01100000000000001111110011101111
n|(n>>2) = 01111000000000001111111111111111
--->1s 1s<---
n>>4 = 00000111100000000000111111111111
n = 01111000000000001111111111111111
n|(n>>4) = 01111111100000001111111111111111
----->1s 1s<------
n>>8 = 00000000011111111000000011111111
n = 01111111100000001111111111111111
n|(n>>8) = 01111111111111111111111111111111
------->1s 1s<--------
n>>16 = 00000000000000000111111111111111
n = 01111111111111111111111111111111
n|(n>>8) = 01111111111111111111111111111111
--------->1s 1s<---------
n++ = 10000000000000000000000000000000 (0x80000000)
As you can see, at each iteration, we are slowly creating a mask (in the form (1<<n)-1
). By the end, when adding 1
, we get the next power of two: 1<<n
.
BitSet
or BitVector
are used interchangeably to refer to a data structure representing a collection of bits, each of which can be 0 or 1. A BitSet
is like a massive panel with ON/OFF
switches. You can alter the state of those ON/OFF
switches if you know their position (index in the set).
However, there can be some differences in the implementation and usage of the two terms based on the context they are being used. Sometimes, A
BitSet
may refer to a fixed-sized collection of bits, whileBitVector
may refer to a dynamically resizable collection of bits.
For simplicity, we will implement a BitSet
using, you’ve guessed it, bitwise operations. The minimal set of operations a BitSet
should support are:
BitSet
(setting all the bits to 0
).Now that we know how to manipulate the individual bits of an integer, we can say that:
uint16_t
can be considered a BitSet
with a size of 16
;uint32_t
can be considered a fixed-sized BitSet
with a size of 32
;uint64_t
can be considered a fixed-sized BitSet
with a size of 64
;But what if we want a BitSet
with a size bigger than 64
? We don’t have uint128_t
(yet!). So we will probably have to use 4
uint32_t
or 2
uint64_t
. So a BitSet
is an array of fixed-sized numbers (uintN_t
), where we index bits by their relative position in the BitSet
.
The following diagram describes an array of 4 uint32_t
integers with a total of 4*32
ON/OFF switches (bits 0 or 1). Each bit should be accessible relative to their position in the array, not to the relative position in their integer (words):
To implement the BitSet
we will use C macros:
#define SET_NW(n) (((n) + 31) >> 5)
#define SET_W(index) ((index) >> 5)
#define SET_MASK(index) (1U << ((index) & 31))
#define SET_DECLARE(name, size) uint32_t name[SET_NW(size)] = {0}
#define SET_1(name, index) (name[SET_W(index)] |= SET_MASK(index))
#define SET_0(name, index) (name[SET_W(index)] &= ~SET_MASK(index))
#define SET_GET(name, index) (name[SET_W(index)] & SET_MASK(index))
Things can look daunting at first, so let’s take each line and explain what it does.
SET_NW
#define SET_NW(n) (((n) + 31) >> 5)
This macro can determine the number of uint32_t
words we need to represent a BitSet
of a given size.
If for example, we need 47
positions, then SET_NW(47)
will return (47+31)>>5
, which is equivalent to saying (47+31)/32=2
. Our array needs at least two uint32_t
integers to hold 47
values.
SET_W
#define SET_W(index) ((index) >> 5)
This macro returns the index of the uint32_t
word that contains the bit we are looking for at the given index.
For example, if our BitSet
has 64
indices (2 uint32_t
words), calling SET_W(35)
will return 35>>5
, which is equivalent to saying 35/32=1
. So we must look for the bit in the second uint32_t
.
SET_MASK
#define SET_MASK(index) (1U << ((index) & 31))
Based on a given index, this returns the mask to select that individual bit from the uint32_t
word. index & 31
is equivalent to saying index % 32
.
So, if, for example, we call SET_MASK(16)
it will create a mask that selects the bit 16
from the corresponding uint32_t
word. If we call SET_MASK(35)
, it will create a mask that selects the bit 3
from the corresponding uint32_t
word.
SET_MASK
works at the word level, while SET_W
works at the array level.
SET_DECLARE
#define SET_DECLARE(name, size) uint32_t name[SET_NW(size)] = {0};
This macro declares a bitset array (uint32_t[]
) with the given name
and size
, and initializes it to all zeros. After declaration, the BitSet
is a clean state, no ON/OFF switch is activated.
SET_1
and SET_0
#define SET_1(name, index) (name[SET_W(index)] |= SET_MASK(index))
#define SET_0(name, index) (name[SET_W(index)] &= ~SET_MASK(index))
Those macros can be used to SET to 0
or 1
specific bits inside the BIT_VECT
. The techniques for doing this were already described here.
SET_GET
#define SET_GET(name, index) (name[SET_W(index)] & SET_MASK(index))
This macro is used to check whether a bit is a set. If the bit is set to 1
, the macro returns a non-zero value. If the bit is set to 0
, the macro returns 0
.
BitSet
To test the newly defined “macro” BitSet
we can use this code:
// Declares uint32_t bitset[3] = {0};
SET_DECLARE(bitset, 84);
// Sets the bits 1 and 80 to 1
SET_1(bitset, 1);
SET_1(bitset, 80);
printf("Is bit %d set? Answer: %s.\n", 1, SET_GET(bitset, 1) ? "YES" : "NO");
printf("Is bit %d set? Answer: %s.\n", 2, SET_GET(bitset, 2) ? "YES" : "NO");
printf("Is bit %d set? Answer: %s.\n", 80, SET_GET(bitset, 80) ? "YES" : "NO");
//Output
// Is bit 1 set? Answer: YES.
// Is bit 2 set? Answer: NO.
// Is bit 80 set? Answer: YES.
This bitwise trick is generally uncalled for, but it’s nice to use when you want to deceive your naive colleagues.
Swapping the value of two variables is normally done using an intermediary variable. This is one of the first “algorithms” we learn when we start programming:
void swap(int *a, int *b) {
int tmp = *a;
*a = *b;
*b = tmp;
}
int main(void) {
int a = 5, b = 7;
printf("Before swap: a=%d b=%d\n", a,b);
swap(&a, &b);
printf("After swap: a=%d b=%d\n", a,b);
return 0;
}
But there are two other ways of achieving the same result without the need to introduce a new variable, tmp
:
void swap_xor(uint8_t *a, uint8_t *b) {
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
int main(void) {
uint8_t a = 7u;
uint8_t b = 13u;
printf("Before swap: a=%hhu b=%hhu\n", a, b);
swap_xor(&a, &b);
printf("After swap: a=%hhu b=%hhu\n", a, b);
return 0;
}
// Output
// Before swap: a=7 b=13
// After swap: a=13 b=7
What kind of magic is this? Well, after those two lines of code: *a ^= *b; *b ^= *a;
we can actually say that *b=(*b)^((*a)^(*b))
, but because ^
(XOR
) is associative and commutative, the relationship also translates to *b=(*b)^(*b)^(*a)
, which is equivalent to *b=0^(*a)
, which is equivalent *b=*a
, wow, did just b
become a
?
Again, we can also write *a=((*a)^(*b))^((*b)^((*a)^(*b)))
, which is equivalent to *a=(*a)^(*a)^(*b)
, wow, did a
just become b
? It looks complicated, but it’s not. Put this on paper, and the mystery will unfold.
Characters in C (char
) described by this wonderful mapping, more details about ASCII codes here:
All, without getting deeper into the complicated realm of character encoding, we can undoubtedly say that your average char
in C
is a number. All the printable char
symbols are found in the following interval: [32..126]
.
The uppercase letters are found inside the interval: [65..90]
, while the lowercase letters are to be found inside the interval: [61..79]
. Because char
s are numbers, we can use bitwise operations for them.
Uppercase and lowercase letters have identical bit patterns, except the column corresponding to 1<<5
. So with the right masks put in place, we can transition back and forth from lowercase to uppercase format. But what is the right mask? Well, it’s exactly the one corresponding to 0x20
, which is the ' '
(the space) character.
So if we take an uppercase letter and | ' '
, we will obtain a lowercase letter because we will activate the bit corresponding to 1<<5
.
char *p = "ABCDEFGH";
while(*p) {
printf("%c", *p | ' ');
p++;
}
// Output
// abcdefgh
If on the contrary, we take a lowercase letter and & '_'
(which corresponds to 0b01011111
) we are going to “eliminate” the 1<<5
bit and transform the initial letter to its uppercase form:
char *p = "abcdefgh";
while(*p){
printf("%c", *p & '_');
p++;
}
// Ouput
// ABCDEFGH
If we want to toggle the case, we use ^ ' '
(XOR <space>
):
char *p = "aBcDeFgH";
while(*p) {
printf("%c", *p ^ ' ');
p++;
}
// Output
// AbCdEfG
Other bitwise tricks involving char
that you might find interesting are:
// Getting the lowercase letter position in the alphabet
for(char i = 'a'; i <= 'z'; i++) {
printf("%c --> %d\n", i, (i ^ '`'));
}
// Output
// a --> 1
// b --> 2
// c --> 3
// d --> 4
// e --> 5
// f --> 6
// g --> 7
// ... and so on
// Getting the uppercase letter position in the alphabet
for(char i = 'A'; i <= 'Z'; i++) {
printf("%c --> %d\n", i, (i & '?'));
}
// Output
// A --> 1
// B --> 2
// C --> 3
// D --> 4
// E --> 5
// F --> 6
// G --> 7
// ... and so on
Gray Code, also known as Reflected Binary is a binary numeral system where two consecutive numbers differ in only one-bit position. This new way of representing binary numbers is useful in applications such as electronics, where errors due to spurious output transitions can be eliminated using Gray Code instead of the traditional binary code (the one we’ve used until now).
The table of correspondence looks like this:
Decimal | Binary | Gray Code | Decimal Gray |
---|---|---|---|
0 | 0000 | 0000 | 0 |
1 | 0001 | 0001 | 1 |
2 | 0010 | 0011 | 3 |
3 | 0011 | 0010 | 2 |
4 | 0100 | 0110 | 6 |
5 | 0101 | 0111 | 7 |
6 | 0110 | 0101 | 5 |
7 | 0111 | 0100 | 4 |
8 | 1000 | 1100 | 12 |
9 | 1001 | 1101 | 13 |
10 | 1010 | 1111 | 15 |
11 | 1011 | 1110 | 14 |
12 | 1100 | 1010 | 10 |
13 | 1101 | 1011 | 11 |
14 | 1110 | 1001 | 9 |
15 | 1111 | 1000 | 8 |
The algorithm for transitioning from the binary system to the Gray Binary System is as follows:
uint8_t gray_code(uint8_t bv) {
uint8_t gv = 0; // 1. We initialize the result with 0
uint8_t mask = bv; // 2. We use the input binary number as a mask
while (mask) { // 3. Until the mask is different than 0
gv ^= mask; // 4. We XOR the result with the mask
mask >>= 1; // 5. We shift the mask one bit to the right
}
return gv; // 6. We return the corresponding Gray Code
}
If we run the code, it will return the previously described table of correspondence:
for(uint8_t i = 0; i < 16; i++) {
print_bits(stdout, i);
printf(" - ");
print_bits(stdout, gray_code(i));
printf("\n");
}
// Output
// 00000000 - 00000000
// 00000001 - 00000001
// 00000010 - 00000011
// 00000011 - 00000010
// 00000100 - 00000111
// 00000101 - 00000110
// 00000110 - 00000100
// ... and so on
A less obvious utilisation of Gray Codes is that we can use them to generate permutations of a given set of elements (such as the characters of string, or the elements of an array). The basic idea is simple, each permutation is a sequence of Gray Codes, where each code represents a swap between adjacent elements. By iterating from all the Gray Codes, we do swaps, and at the same time, generate each permutation.
The world of bitwise tricks is much bigger than what was covered in this article. Thank you for reading up until this point. Please check the References section below to see more on the topic.
If you are curious to see some of my code where I did use bitwise operations intensively, please check up the following two articles:
This article has been discussed on:
There was a time when (even) the introduction of \(0\) broke the mathematics of the day. At first, \(0\) was highly pathological. For example, the 7th-century mathematician Brahmagupta said about division by zero:
A positive or negative number when divided by zero is a fraction with the zero as denominator. Zero divided by a negative or positive number is either zero or is expressed as a fraction with zero as numerator and the finite quantity as denominator. Zero divided by zero is zero.
Two hundred years later, another Indian Mathematician, called Mahāvīra, made a different error refuting the assertions of his predecessor:
A number remains unchanged when divided by zero.
In our time, we have no problem bending both \(0\) or \(\infty\) to our mathematical needs, so mathematicians always have found a way to incorporate the seemingly impossible into broader theories. The same story goes for irrational numbers, complex numbers, and all the other peculiarities discovered and accepted through the centuries.
In Computer Science, Pathological Input is a slightly different concept, strongly linked to the study of algorithms and data structures. In this case, the input causes atypical behavior for an algorithm, such as a violation of its average-case complexity.
In this article, we will generate pathological input for Java HashMaps, and see how bad they perform once hit with our malicious set of keys. Compared to a real-world scenario, our pathological set will contain only values that are highly susceptible to collisions. The way to do that is by “reverse engineer” the way the Object::hashCode()
method works in Java.
For example, the String::hashCode()
looks like this:
// This is the internal representation of a String
// as an array of bytes.
private final byte[] value;
//... more code here
public int hashCode() {
int h = hash;
if (h == 0 && !hashIsZero) {
h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
if (h == 0) {
hashIsZero = true;
} else {
hash = h;
}
}
return h;
}
Going further into StringLatin1.hashCode()
, we will see how simple is the actual hashing function used in Java. It’s a straightforward implementation of a polynomial hash function.
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
As you can see, the hashCode()
is computed in a series of steps, each result (h
) depending on the previous one. So from a mathematical perspective, our code looks like this:
Where:
byte[] value
array. In coding terms, \(N\) is the value.length
String
, because we are going to use the US_ASCII
charset. In this case, each character will be represented as a single byte.byte[] value
array, where \(1 \leq i \leq N\). So think our String
equivalent to the ordered set \((v_{1}, v_{2}, ... , v{i}, ... v_{N})\).h
(initially int h=0
).hashCode()
value computed in the loop when iterating over \(v_{i}\) (at step \(i\)).haschCode()
value, the one returned by the Java function (at step n
, when the main loop ends).\(H\) can also be written as the sum: \(H=h_{N}=\sum_{i=1}^{N} v_{i}*31^{N-i}\)
If \(N=1\), the formula becomes: \(H=h_{1}=v_{1}\)
It means that hashCode()
for single-character Strings is the actual byte value of the Character
(\(v_{1}\)).
If we run the following code, we will see that our “assumption” is correct:
String a = "a";
String b = "b";
System.out.println("The byte representation of 'a' is: " + a.getBytes(StandardCharsets.US_ASCII)[0]);
System.out.println("The hashCode representation of 'a' is:" + a.hashCode());
System.out.println("The byte representation of 'b' is: " + b.getBytes(StandardCharsets.US_ASCII)[0]);
System.out.println("The hashCode representation of 'b' is: " + b.hashCode());
With the output:
The byte representation of 'a' is: 97
The hashCode representation of 'a' is:97
The byte representation of 'b' is: 98
The hashCode representation of 'b' is: 98
Because of this, it’s impossible to generate colliding single-character Strings (when \(N=1\)). All characters are different; thus, they have unique byte[]
representations. But the situation becomes much more enjoyable when \(N \geq 2\).
So let’s take the case when \(N=2\). Our formula for obtaining the hashCode()
of two-character Strings becomes:
In this regard, \(v_{1}\) is the byte
value of the first character, and \(v_{2}\) is the byte
value of the second character of the String
. And here comes the funny part, we can find various values for \(v_{1}\) and \(v_{2}\) so that \(H\) remains the same, thus creating a collision.
There are multiple combinations of numbers \(v_{1}^{i}\) and \(v_{2}^{i}\) so that \(H=h_{2}=v_{1}^{i}+v_{2}^{i}\) is true, because \(H\) can be written as:
\[H=31*(v_{1} + 0) + (v_{2} - 31*0) \\ H=31*(\underbrace{v_{1}+1}_{v_{1}^{1}}) + (\underbrace{v_{2}-31*1}_{v_{2}^{1}}) \\ H=31*(\underbrace{v_{2}+2}_{v_{1}^{2}}) + (\underbrace{v_{2}-31*2}_{v_{2}^{2}}) \\ H=31*(\underbrace{v_{2}+3}_{v_{1}^{3}}) + (\underbrace{v_{2}-32*3}_{v_{2}^{3}}) \\ \vdots \\ H=31*(\underbrace{v_{1}+i}_{v_{1}^{i}}) + (\underbrace{v_{2}-31*i}_{v_{2}^{i}}) \\ \vdots \\ \text{ and so on ...}\]So theoretically, there is an infinite number of pairs \(v_{1}^{i}\) and \(v_{2}^{i}\) to satisfy the condition: \(H=h_{2}=31*v_{1}^{i}+v_{2}^{i}\). To determine them, we only apply the following formulas on the initial two characters: \(v_{1}\) and \(v_{2}\):
\[v_{1}^{i}=v_{1} + i \\ v_{2}^{i}=v_{2} - 31*i\]In practice, we will use the US_ASCII
encoding, so our characters will be in the interval [32, 127]
.
Applying the formulas, it becomes pretty straightforward to determine all the possible colliding 2-character Strings for a given String:
public static Set<String> getCollidingStrings(String srcString) {
if (srcString.getBytes(US_ASCII).length>2) {
throw new IllegalArgumentException("The string should have two characters only");
}
HashSet<String> result = new HashSet<>();
result.add(srcString);
byte[] crt = srcString.getBytes(US_ASCII);
while(true) {
crt[0] += 1; // we increment v_{1} with 1
crt[1] -= 31; // we decrement v_{2} with 31
if (crt[0]>127 || crt[1]<32) break; // we exit our bounds (break loop)
result.add(new String(crt, US_ASCII)); // we add the result
}
return result;
}
Let’s pick "aa"
as the srcString
:
List<String> result = getCollidingStrings("aa");
result.forEach(s -> {
System.out.printf("s='%s'\n", s);
System.out.printf("'%s'.hashCode()=%d\n", s, s.hashCode());
});
Output:
s='aa'
'aa'.hashCode()=3104
s='bB'
'bB'.hashCode()=3104
s='c#'
'c#'.hashCode()=3104
So those 3 Strings are, in a way, pathological. But they are insufficient to prove our point. A HashMap<String, T>
can efficiently deal with three elements colliding, we need to generate significantly more collisions by building bigger colliding Strings.
The exciting aspect is that we can reuse "aa"
, "bB"
, "c#"
to build lengthier inputs by concatenating them into different combinations. The math behind this last assumption is quite simple.
Let’s generate four-character long strings that collide, \(N=4\). The formula for the hashCode
will become:
So if you look closer, what we have to do now is to find pairs of values (\(v_{3}^i\), \(v_{4}^i\)) and (\(v_{1}^{i}\), \(v_{2}^{i}\)) for which \(H_{2}\) and respectively \(H_{1}\) remain constant. It’s the same exercise we did before when \(N=2\).
We can generate new ones, or we can reuse the ones we’ve discovered so far:
\[(v_{1}^{i}, v_{2}^{i}) \in \{\text{"aa", "bB", "c#"}\} \\ (v_{3}^{i}, v_{4}^{i}) \in \{\text{"aa", "bB", "c#"}\} \\\]So if we re-arrange the Strings obtained when \(N=2\), we can create longer Strings that, when hashed with Java’s algorithm, will collide:
System.out.println("aaaa".hashCode());
System.out.println("aabB".hashCode());
System.out.println("aac#".hashCode());
System.out.println("bBaa".hashCode());
System.out.println("bBbB".hashCode());
System.out.println("c#aa".hashCode());
System.out.println("c#bB".hashCode());
System.out.println("c#c#".hashCode());
Output:
2986048
2986048
2986048
2986048
2986048
2986048
2986048
2986048
We can also use other values for our \((v_{3}^{i}, v_{4}^{i})\). For example, let’s find out colliding Strings for "go"
by calling getCollidingStrings("go")
. The results are: "go"
, "hP"
, "i1"
. So let’s use:
By re-combining the Strings obtained with \(N=2\), we will get another set of four-character Strings that collide when hashed:
System.out.println("aago".hashCode());
System.out.println("aahP".hashCode());
System.out.println("aai1".hashCode());
System.out.println("bBgo".hashCode());
System.out.println("bBhP".hashCode());
System.out.println("bBi1".hashCode());
System.out.println("c#go".hashCode());
System.out.println("c#hP".hashCode());
System.out.println("c#i1".hashCode());
Output:
2986248
2986248
2986248
2986248
2986248
2986248
2986248
2986248
2986248
The number of colliding Strings we can generate using this algorithm is usually \(3^{N/2}\). If \(N=32\) (our Strings are 32 characters long), we can generate \(3^{32/2}=3^{16}=43.046.721\) colliding values, which is more than enough to ruin the performance of a HashMap<K,V>
, by degrading it to the performance of a TreeMap<K,V>
(for get
/put
operations).
So let’s define the set \(A=\text{\{"aa", "bB", "c#"\}}\) containing three String
that collide.
If we use guava, there’s a Sets.cartesianProduct(Set...)
we can (re)use to implement the algorithm. If not, you can implement the algorithm yourself.
So our code becomes:
public static Set<String> generateCollisions(Set<String> baseSet, int nTimes) {
Set<String>[] sets = new Set[nTimes];
Arrays.fill(sets, baseSet); // fill-up nTimes the array with baseSet
return Sets.cartesianProduct(sets)
.stream()
.map(s -> String.join("",s))
.collect(Collectors.toSet());
}
Where:
baseSet
is obtained by calling the previously defined method: getCollidingStrings()
.nTimes
is the number of times we perform the Cartesian Product. If, for example, nTimes=13
, we will generate 26-character-long Strings, by doing 13 Cartesian Products between baseSet
s. The number of elements generated is 3^nTimes=1594323
.To test the code above code, let’s run the following:
public static void main(String[] args) {
Set<String> baseSet = getCollidingStrings("aa"); // "aa", "bB", "c#"
Set<String> collidingStrings = generateCollisions(baseSet, 13);
System.out.println("Strings generated: " + collidingStrings.size());
System.out.println("Size of the String: " + collidingStrings
.iterator()
.next()
.length());
System.out.println("Distinct hash values: " + collidingStrings
.stream().map(String::hashCode)
.collect(Collectors.toSet())
.size());
}
The output will be:
Strings generated: 1594323
Size of the String: 26
Distinct hash values: 1
The elements from collidingStrings
look like this:
c#c#c#c#c#c#bBc#c#c#c#bBc#,
c#c#c#c#c#c#bBc#c#c#c#c#aa,
c#c#c#c#c#c#bBc#c#c#c#c#bB,
c#c#c#c#c#c#bBc#c#c#c#c#c#,
c#c#c#c#c#c#c#aaaaaaaaaaaa,
c#c#c#c#c#c#c#aaaaaaaaaabB,
c#c#c#c#c#c#c#aaaaaaaaaac#,
c#c#c#c#c#c#c#aaaaaaaabBaa,
c#c#c#c#c#c#c#aaaaaaaabBbB,
c#c#c#c#c#c#c#aaaaaaaabBc#,
c#c#c#c#c#c#c#aaaaaaaac#aa,
c#c#c#c#c#c#c#aaaaaaaac#bB,
c#c#c#c#c#c#c#aaaaaaaac#c#,
c#c#c#c#c#c#c#aaaaaabBaaaa,
Conclusions:
HashMap<String, T>
, the performance of HashMap
would degrade to the one of a TreeSet
.Thanks for reading so far!
]]>… actually you can use only 2, but this will make your life a little more miserable.
After not implementing a game of snake in ages, I’ve decided to do my best today, but with some strange and absurd limitations in mind, you know, to spice up things:
uint32_t
where 1s
will form the reptile’s body. The map will contain 4x8
positions. Enough to have fun!uint64_t
as a directions array - this will be useful to move the snake around, while keeping its growing shape intact;uint32_t
to keep the positions of the head
, the tail
, the apple
, and the (current) length
. Any input from the keyboard will also be kept here (2 bits will be enough).uint8_t
) variable for looping.Because there’s no standard C-way of interacting with the keyboard, I will have to rely on curses
, so if you want to compile the program, make sure you have the lib installed on your system. If you’re using the right type of operating system, chances are it’s already there. If not you can certainly install it from your favorite package manager.
Unfortunately, curses
uses additional memory by itself, but let’s be honest, hacking with arcane escape chars and low level system functions is not fun, and certainly not something that I am willing to try by myself. Yes, it’s cheating, and this article is a fraud!
Before continue reading (if you haven’t stopped by now), note that the code should be taken as a twisted joke, or as an exercise in minimalism, or as both, probably a joke. Because of the aforementioned limitations, we are going to write some nasty macros to perform bitwise operations, use global variables, reuse the same counter, etc. This is not a good example of readable or elegant code.
Everything is available on GitHub:
git clone git@github.com:nomemory/integers-snake.git
To compile and run the program:
gcc -Wall snake.c -lcurses && ./a.out
We will start by defining the 4 integers that will hold all our game data:
uint32_t map = ...;
uint32_t vars = ...;
uint64_t shape = ...;
int8_t i = ...;
map
map
is what we are going to display on the screen. The 32
bits of map
will form a 4x8
grid that will be rendered using curses:
To access the memory and set bits to zero or one, we can use the following macros:
#define s_is_set(b) ((map&(1<<(b)))!=0) // checks if the b bit from the map is set to 1
#define s_tog(b) (map^=(1<<(b))) // toggles the b bit of the map (currently not used)
#define s_set_0(b) (map&=~(1<<b)) // sets to 0 the b bit from the map
#define s_set_1(b) (map|=(1<<b)) // sets to 1 the b bit from the map
vars
vars
is a 32
bits integer where will keep the following data:
hpos
(from bit 0
to 4
) represents the head position of the snake as an offset from the map’s LSB;tpos
(from bit 5
to 9
) represents the tail position of the snake as an offset from the map’s LSB;len
(from bit 10
to 14
) represents the length of the snake;apos
(from bit 15
to 19
) represents the apple position as an offset from the map’s LSB;chdir
(from bit 20
to 21
) represents the last key pressed, 2
bits are enough, because only arrows are registered, and there are 4 of them;uint8_t
counter here, but for simplicity I’ve chosen to pick a separate variable;To access hpos
, tpos
, etc. we have defined the following macros. Each of them works like a getter/setter for the corresponding segments:
#define s_mask(start,len) (s_ls_bits(len)<<(start)) // creates a bitmask of len starting from position start
#define s_prep(y,start,len) (((y)&s_ls_bits(len))<<(start)) // prepares the mask
// Gets the the 'len' number of bits, starting from position 'start' of 'y'
#define s_get(y,start,len) (((y)>>(start))&s_ls_bits(len))
// Sets the the 'len' number of bits, starting from position 'start' of 'y' to the value 'bf'
#define s_set(x,bf,start,len) (x=((x)&~s_mask(start,len))|s_prep(bf,start,len))
#define s_hpos s_get(vars,0,5) // gets the last 5 bits of 'vars', which corresponds to s_hpos
#define s_tpos s_get(vars,5,5) // sets the last 5 bits of 'vars', which corresonds to s_hpos
#define s_len s_get(vars,10,5)
#define s_apos s_get(vars,15,5)
#define s_chdir s_get(vars,20,2)
#define s_hpos_set(pos) s_set(vars,pos,0,5)
#define s_tpos_set(pos) s_set(vars,pos,5,5)
#define s_len_set(len) s_set(vars,len,10,5)
#define s_apos_set(app) s_set(vars,app,15,5)
#define s_chdir_set(cdir) s_set(vars,cdir,20,2)
#define s_len_inc s_len_set(s_len+1)
For more information, describing the technique behind the macros, please read the following article: Working with bits and bitfields.
shape
shape
keeps the directions for each cell of the snake. 2
bits per direction are enough, so we can keep a total of 32 directions:
The possible directions are mapped using the following macros:
#define SU 0 //UP
#define SD 1 //DOWN
#define SL 2 //LEFT
#define SR 3 //RIGHT
Each time the snake moves inside the map
grid, we cycle through the directions, with the following macros:
#define s_hdir ((shape>>(s_len*2)&3)) // retrieves the head direction (based on s_slen)
#define s_tdir (shape&3) // retrieves the last 2 bits which corresponds to the tail
#define s_hdir_set(d) s_set(shape,d,s_len*2,2) // sets the head direction
#define s_tdir_set(d) s_set(shape,d,0,2) // sets the tail direction
// Macros for changing the shape each time the snake moves
#define s_shape_rot(nd) do { shape>>=2; s_hdir_set(nd); } while(0);
#define s_shape_add(nd) do { s_len_inc; shape<<=2; s_tdir_set(nd); } while(0);
When the snake moves, without eating an apple we call the s_shape_rot
macro that removes the last direction, and pushes a new head (based on s_chdir
).
In this regard, shape behaves like a queue:
When the snake moves and eats an apple we call s_shape_add
that increases the length, and pushes a new tail s_tdir
.
The game loop looks like this:
// Some macros to make the code more readable
// (or unreadable depending on you)
#define s_init do { srand(time(0)); initscr(); keypad(stdscr, TRUE); cbreak(); noecho(); } while(0);
#define s_exit(e) do { endwin(); exit(e); } while(0);
#define s_key_press(k1, k2) if (s_hdir==k2) break; s_chdir_set(k1); break;
int main(void) {
s_init; // initialize the curses context
rnd_apple(); // creates a random position for the apple
while(1) {
show_map(); // renders the map on screen
timeout(80); // getch() timeouts after waiting for user input
switch (getch()) {
case KEY_UP : { s_key_press(SU, SD) };
case KEY_DOWN : { s_key_press(SD, SU) };
case KEY_LEFT : { s_key_press(SL, SR) };
case KEY_RIGHT : { s_key_press(SR, SL) };
case 'q' : exit(0); // Quits the game
}
move_snake(); // The snake moves inside the grid
s_shape_rot(s_chdir); // The shape is getting updated
napms(200); // frame rate :))
}
s_exit(0); // games exits
}
Each time a key is pressed s_key_press
is expanded. This checks if the movement is possible, and then updates the s_chdir
(using s_chdir_set
).
The reason s_key_press
has two input parameters is to exclude the opposite direction. For example if the snake is currently moving to the RIGHT (SR
), a SL
is not possible, and thus we break the switch.
move_snake()
is where most of our logic is implemented:
#define s_next_l s_mask5(s_hpos+1) // incrementing the offset to go right
#define s_next_r s_mask5(s_hpos-1) // decrementing the offset to go left
#define s_next_u s_mask5(s_hpos+8) // change row up, by adding 8 positions to the offset
#define s_next_d s_mask5(s_hpos-8) // change row down, by removing 8 positions from the offset
// Check if a left movement is possible.
static void check_l() { if ((s_mod_p2(s_next_l,8) < s_mod_p2(s_hpos,8)) || s_is_set(s_next_l)) s_exit(-1); }
// Check if a right movement is possible.
static void check_r() { if ((s_mod_p2(s_next_r,8) > s_mod_p2(s_hpos,8)) || s_is_set(s_next_r)) s_exit(-1); }
// Check if a up movement is possible
static void check_u() { if ((s_next_u < s_hpos) || s_is_set(s_next_u)) s_exit(-1); }
// Check if a down movement is possible
static void check_d() { if ((s_next_d > s_hpos) || s_is_set(s_next_d)) s_exit(-1); }
static void move_snake() {
if (s_hdir==SL) { check_l(); s_hpos_set(s_hpos+1); }
else if (s_hdir==SR) { check_r(); s_hpos_set(s_hpos-1); }
else if (s_hdir==SU) { check_u(); s_hpos_set(s_hpos+8); }
else if (s_hdir==SD) { check_d(); s_hpos_set(s_hpos-8); }
// Sets the bit based on the current s_hdir and s_hpos
s_set_1(s_hpos);
// If an apple is eaten
if (s_apos==s_hpos) {
// We generate another apple so we don't starve
rnd_apple();
// Append to the tail
s_shape_add(s_tdir);
// We stop clearning the tail bit
return;
}
// Clear the tail bit
s_set_0(s_tpos);
// Update the t_pos so we can clear the next tail bit when the snake moves
if (s_tdir==SL) { s_tpos_set(s_tpos+1); }
else if (s_tdir==SR) { s_tpos_set(s_tpos-1); }
else if (s_tdir==SU) { s_tpos_set(s_tpos+8); }
else if (s_tdir==SD) { s_tpos_set(s_tpos-8); }
}
To validate if the snake can move or not in the grid, we’ve implemented the check_*()
functions:
check_l()
- we check if the X coordinate of the snake (the modulo %8
of the s_hpos
) is bigger than the one from the previous position;check_r()
- we check if the X coordinate of the snake (the modulo %8
of the s_hpos
) is smaller than the one from the previous position;check_u()
and check_d
work in the same way, they see if the by incrementing s_hpos
it overflows. If it does, then it means we’ve exited the grid. Overflows are used as features.This is the last function we are going to implement:
static void show_map() {
clear();
i=32;
while(i-->0) { // !! Trigger warning for sensitive people, incoming '-->0'
// If the bit is an apple, we render the apple '@'
if (i==s_apos) { addch('@'); addch(' '); }
// We draw either the snake bit ('#') or the empty bit ('.')
else { addch(s_is_set(i) ? '#':'.'); addch(' '); }
// We construct the grid by inserting a new line
if (!s_mod_p2(i,8)) { addch('\n'); }
};
}
After all the macros expands, the resulting code looks like this:
uint32_t map = 0x700;
uint32_t vars = 0x20090a;
uint64_t shape = 0x2a;
int8_t i = 0;
static void rnd_apple() {
i = (rand()&(32 -1));
while(((map&(1<<(i)))!=0)) i = (rand()&(32 -1));
(vars=((vars)&~(((1<<(5))-1)<<(15)))|(((i)&((1<<(5))-1))<<(15)));
}
static void show_map() {
wclear(stdscr);
i=32;
while(i-->0) {
if (i==(((vars)>>(15))&((1<<(5))-1))) { waddch(stdscr,'@'); waddch(stdscr,' '); }
else { waddch(stdscr,((map&(1<<(i)))!=0) ? '#':'.'); waddch(stdscr,' '); }
if (!(i&(8 -1))) { waddch(stdscr,'\n'); }
};
}
static void check_l() { if ((((((((vars)>>(0))&((1<<(5))-1))+1)&0x1f)&(8 -1)) < ((((vars)>>(0))&((1<<(5))-1))&(8 -1))) || ((map&(1<<((((((vars)>>(0))&((1<<(5))-1))+1)&0x1f))))!=0)) do { endwin(); exit(-1); } while(0);; }
static void check_r() { if ((((((((vars)>>(0))&((1<<(5))-1))-1)&0x1f)&(8 -1)) > ((((vars)>>(0))&((1<<(5))-1))&(8 -1))) || ((map&(1<<((((((vars)>>(0))&((1<<(5))-1))-1)&0x1f))))!=0)) do { endwin(); exit(-1); } while(0);; }
static void check_u() { if (((((((vars)>>(0))&((1<<(5))-1))+8)&0x1f) < (((vars)>>(0))&((1<<(5))-1))) || ((map&(1<<((((((vars)>>(0))&((1<<(5))-1))+8)&0x1f))))!=0)) do { endwin(); exit(-1); } while(0);; }
static void check_d() { if (((((((vars)>>(0))&((1<<(5))-1))-8)&0x1f) > (((vars)>>(0))&((1<<(5))-1))) || ((map&(1<<((((((vars)>>(0))&((1<<(5))-1))-8)&0x1f))))!=0)) do { endwin(); exit(-1); } while(0);; }
static void move_snake() {
if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==2) { check_l(); (vars=((vars)&~(((1<<(5))-1)<<(0)))|((((((vars)>>(0))&((1<<(5))-1))+1)&((1<<(5))-1))<<(0))); }
else if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==3) { check_r(); (vars=((vars)&~(((1<<(5))-1)<<(0)))|((((((vars)>>(0))&((1<<(5))-1))-1)&((1<<(5))-1))<<(0))); }
else if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==0) { check_u(); (vars=((vars)&~(((1<<(5))-1)<<(0)))|((((((vars)>>(0))&((1<<(5))-1))+8)&((1<<(5))-1))<<(0))); }
else if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==1) { check_d(); (vars=((vars)&~(((1<<(5))-1)<<(0)))|((((((vars)>>(0))&((1<<(5))-1))-8)&((1<<(5))-1))<<(0))); }
(map|=(1<<(((vars)>>(0))&((1<<(5))-1))));
if ((((vars)>>(15))&((1<<(5))-1))==(((vars)>>(0))&((1<<(5))-1))) {
rnd_apple();
do { (vars=((vars)&~(((1<<(5))-1)<<(10)))|((((((vars)>>(10))&((1<<(5))-1))+1)&((1<<(5))-1))<<(10))); shape<<=2; (shape=((shape)&~(((1<<(2))-1)<<(0)))|((((shape&3))&((1<<(2))-1))<<(0))); } while(0);;
return;
}
(map&=~(1<<(((vars)>>(5))&((1<<(5))-1))));
if ((shape&3)==2) { (vars=((vars)&~(((1<<(5))-1)<<(5)))|((((((vars)>>(5))&((1<<(5))-1))+1)&((1<<(5))-1))<<(5))); }
else if ((shape&3)==3) { (vars=((vars)&~(((1<<(5))-1)<<(5)))|((((((vars)>>(5))&((1<<(5))-1))-1)&((1<<(5))-1))<<(5))); }
else if ((shape&3)==0) { (vars=((vars)&~(((1<<(5))-1)<<(5)))|((((((vars)>>(5))&((1<<(5))-1))+8)&((1<<(5))-1))<<(5))); }
else if ((shape&3)==1) { (vars=((vars)&~(((1<<(5))-1)<<(5)))|((((((vars)>>(5))&((1<<(5))-1))-8)&((1<<(5))-1))<<(5))); }
}
int main(void) {
do { srand(time(0)); initscr(); keypad(stdscr, 1); cbreak(); noecho(); } while(0);;
rnd_apple();
while(1) {
show_map();
wtimeout(stdscr,80);
switch (wgetch(stdscr)) {
case 0403 : { if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==1) break; (vars=((vars)&~(((1<<(2))-1)<<(20)))|(((0)&((1<<(2))-1))<<(20))); break; };
case 0402 : { if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==0) break; (vars=((vars)&~(((1<<(2))-1)<<(20)))|(((1)&((1<<(2))-1))<<(20))); break; };
case 0404 : { if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==3) break; (vars=((vars)&~(((1<<(2))-1)<<(20)))|(((2)&((1<<(2))-1))<<(20))); break; };
case 0405 : { if (((shape>>((((vars)>>(10))&((1<<(5))-1))*2)&3))==2) break; (vars=((vars)&~(((1<<(2))-1)<<(20)))|(((3)&((1<<(2))-1))<<(20))); break; };
case 'q' : exit(0);
}
move_snake();
do { shape>>=2; (shape=((shape)&~(((1<<(2))-1)<<((((vars)>>(10))&((1<<(5))-1))*2)))|((((((vars)>>(20))&((1<<(2))-1)))&((1<<(2))-1))<<((((vars)>>(10))&((1<<(5))-1))*2))); } while(0);;
napms(200);
}
do { endwin(); exit(0); } while(0);;
}
It’s not a beautiful sight, but it looks mesmerizing. When I scroll through the code, I get motion bitwise sickness.
It was a fun exercise. The full code can be found here, it’s around 100 lines, and 4 integers.
If the snake moves to fast on your terminal, you can tweak the s_napms
by increasing it.
It was an idea crazy enough (in a positive way) to try it myself in C.
So, what if I wrote my own blogging “platform” (in the lack of a more suitable term)? But, instead of outputting a static HTML site, my platform outputs a single executable binary file compatible with any *Nix platform. There would be no HTML files, no other assets, just a piece of source code that gets to be recompiled each time I plan to update my “content”. Everything stays in memory, and my site is (an) executable.
To go entirely minimalistic, I’ve decided to impose myself additional rules to follow:
calloc()
, malloc()
, free()
and the likes.That being (optimistically) settled down, I’ve realized I had to write my own primitive HTTP web server; nothing exceedingly fancy, but something that supports GET requests.
The last time I touched C sockets programming was more than 14 years ago, while I was in UNI. Reading an entire book about the topic was not an option, but I found something better called Beej’s Network Programming guide. If you are already familiar with C, this tutorial has enough information to get you started, and it only takes a few hours to go through it. I’ve copy-pasted the examples, gone through them, modified a few things, and everything worked. End of story.
The next step was to read a bit of the HTTP Protocol. As a backend developer, I have a broad understanding of how it functions. But I was not (recently) put in the position to look closely at what the actual messages look like. I found out that Firefox, the RFC, and the duckduckgo search engine were my friends.
That being said and done, I was good to go.
I’ve unimaginatively named my blogging platform microblog-c
. The code and the the samples are available here:
git clone git@github.com:nomemory/microblog-c.git
To build the sample blog, we compile microblog.c
:
>> gcc -Wall microblog.c -o microblog
>> ./microblog
If everything goes well, the internal server will start serving HTTP requests on port 8080
.
If you are curious to see what everything looks like, open a browser: http://localhost:8080, and enjoy:
As you can see, CSS is not my strongest skill.
There’s no need to touch the microblog.c
source code for adding new content to the blog.
We start by creating a new file in the ./cnt
folder called jimihendrix
:
{
.content_type = "text/html",
.body = "<p>Jimmy Hendridx</p>"
"<p>Jimmy Hendrix says hello</p>"
}
Note to self: It’s JIMI, not JIMMY!
Then, we reference the new file in the posts
file:
#include "cnt/home" /* 0 */
,
#include "cnt/davidbowie" /* 1 */
,
#include "cnt/ozzyosbourne" /* 2 */
,
#include "cnt/jimmyhendrix" /* 3 <---- ADD THIS LINE ---> */
Next, we will make the article visible on the homepage by editing ./cnt/home
:
{
.content_type = "text/html",
.body = "<p>My name is Andrei N. Ciobanu and this is a blog about my favourite musicians.<p>"
"<p>To contact me, please write an email to gnomemory (and then append yahoo.com)<p>"
"<p>List of favourite rock stars:<p>"
"<ol>"
"<li><a href='1'>David Bowie</a></li>"
"<li><a href='2'>Ozzy Osbourne</a></li>"
"<li><a href='3'>Jimmy Hendrix</a></li>" /* <<--- HERE*/
"</ol>"
}
And the final step is to re-compile the blog and re-run the server:
>> gcc -Wall microblog.c -o microblog
>> ./microblog
If we open http://localhost:8080
again we will the changes:
The following part of the article is not a step-by-step guide, so I suggest opening microblog.c
to follow the code as you continue reading.
We start by defining our model, struct post_s
:
#define TEXT_PLAIN "text/plain"
#define TEXT_HTML "text/html"
typedef struct post_s
{
char *content_type;
char *body;
} post;
We will keep things simple from the beginning: a blog post has a content_type
and a body
. The content_type
can be either:
text/html
if we plan to server classical HTML content;text/plain
if we want to work with .txt
files.Our main strategy is to keep all the blog posts inside a global array of post posts[]
which is known at compile-time:
#include <stdio.h>
#define TEXT_PLAIN "text/plain"
#define TEXT_HTML "text/html"
typedef struct post_s
{
char *content_type;
char *body;
} post;
post posts[] = {
{
.content_type = TEXT_PLAIN,
.body = "Article 0"
},
{
.content_type = TEXT_HTML,
.body = "<p>Article 1</p>"
}
};
const size_t posts_size = (sizeof(posts) / sizeof(post));
int main(void) {
// Printing the posts to stdout
for(size_t i = 0; i < posts_size; i++) {
printf("post[%zu].content_type = %s\n", i, posts[i].content_type);
printf("post[%zu].body =\n %s\n", i, posts[i].body);
printf("\n");
}
return 0;
}
The posts
array contains all the content of our blog. Each article is accessible by its index in the global array.
To modify or add something, we would have to alter the source code and then re-compile. But what if there is a way to keep the content outside the source file?
Remember the #include
pre-processor directive ? Well, contrary to popular belief, its usage it’s not limited to header files. We can actually use it to “externalize” our content, and #include
it just before compilation:
/* microblog.c */
post posts[] = {
#include "posts"
};
Where ./posts
is a file on the disk with the following structure:
/* posts file */
{
.content_type = TEXT_PLAIN,
.body = "Article 0"
},
{
.content_type = TEXT_HTML,
.body = "<p>Article 1</p>"
}
When we #include
it, all of it gets inserted back into the source code.
At this point, we can go even further and separate the articles in their own files and further include them in ./posts
. To easily visualise what’s happening check out the following diagram:
To serve our blog content to browsers, we will have to implement a straightforward HTTP Server that supports only the GET request.
A typical GET request looks like this:
GET /1 ........
...............
...............
...............
...........\r\n
The request is a char*
(string) that ends with \r\n
(CRLF
), and starts with GET /<resource>
.
Of course, the spec is infinitely more complex than this. But for the sake of simplicity, we will concentrate only on the first line, ignoring everything else. We will accept only numerical paths (<resources>
). Internally those paths represent indices in our post posts[]
array. By convention, our homepage will be posts[0]
.
For example, to GET
the homepage of our blog, the request should like:
GET / .........
...............
...............
...............
...........\r\n
To GET
another post, let’s say the article posts[2]
, the request should look like this:
GET /2 ........
...............
...............
...............
...........\r\n
The code for creating the actual server is quite straightforward if you are already familiar with C socket programming (I wasn’t, so the code is probably not the best):
#define DEFAULT_BACKLOG 1000
#define DEFAULT_PORT 8080
#define DEFAULT_MAX_FORKS 5
#define DEFAULT_TIMEOUT 10000
int max_forks = DEFAULT_MAX_FORKS;
int cur_forks = 0;
void start_server() {
// Creates a Server Socket
int server_sock_fd = socket(
AF_INET, // Address Familiy specific to IPV4 addresses
SOCK_STREAM, // TCP
0
);
if (!server_sock_fd)
exit_with_error(ERR_SOC_CREATE);
struct sockaddr_in addr_in = {.sin_family = AF_INET,
.sin_addr.s_addr = INADDR_ANY,
.sin_port = htons(DEFAULT_PORT)};
memset(addr_in.sin_zero, '\0', sizeof(addr_in.sin_zero));
// Bind the socket to the address and port
if (bind(server_sock_fd, (struct sockaddr *)&addr_in,
sizeof(struct sockaddr)) == -1)
exit_with_error(ERR_SOC_BIND);
// Start listening for incoming connections
if (listen(server_sock_fd, DEFAULT_BACKLOG) < 0)
exit_with_error(ERR_SOC_LISTEN);
int client_sock_fd;
int addr_in_len = sizeof(addr_in);
for (;;) {
// A cliet has made a request
client_sock_fd = accept(server_sock_fd, (struct sockaddr *)&addr_in,
(socklen_t *)&addr_in_len);
if (client_sock_fd == -1) {
// TODO: LOG ERROR BUT DON 'T EXIT
exit_with_error(ERR_SOC_ACCEPT);
}
pid_t proc = fork();
if (proc < 0) {
// log error
// Close client
close(client_sock_fd);
} else if (proc == 0) {
// We serve the request on a different
// subprocess
server_proc_req(client_sock_fd);
} else {
// We keep track of the number of forks
// the parent is creating
cur_forks++;
// No reason to keep this open in the parent
// We close it
close(client_sock_fd);
}
// Clean up some finished sub-processes
if (!(cur_forks<max_forks)) {
while (waitpid(-1, NULL, WNOHANG) > 0) {
cur_forks--;
}
}
}
close(server_sock_fd);
}
We start by creating server_sock_fd
, which binds and listens to DEFAULT_PORT=8080
. If those operations fail, the code exits and returns either ERR_SOC_BIND
or ERR_SOC_LISTEN
, depending on the error.
The DEFAULT_BACKLOG
refers to the max length of the (internal) queue of pending socket connections server_sock_fd
can grow to. If a connection request arrives when the (internal) queue has more elements than DEFAULT_BACKLOG
, the client may receive an error indicating ECONNREFUSED
.
If the first step is successful, we enter an infinite loop in which we accept new connections. Then, we process each incoming request in its subprocess (using fork()).
There’s a max limit on the number of parallel forks we can have (see max_forks
). Our code keeps track of the running number of forks through cur_forks
. Whenever cur_forks
is close to the limit, we start reaping the zombie sub-processes using waitpid(...)
.
The function responsible with processing the request (server_proc_req
) looks like this:
#define REQ_SIZE (1 << 13)
#define REQ_RES_SIZE (1 << 4)
#define REP_MAX_SIZE (REP_H_FMT_LEN + REP_MAX_CNT_SIZE)
void server_proc_req(int client_sock_fd) {
char rep_buff[REP_MAX_SIZE] = {0};
char req_buff[REQ_SIZE] = {0};
char http_req_res_buff[REQ_RES_SIZE] = {0};
int rec_status = server_receive(client_sock_fd, req_buff);
int rep_status;
if (rec_status == SR_CON_CLOSE) {
// Connecon closed by peer
// There 's no reason to send anything further
exit(EXIT_SUCCESS);
} else if (rec_status == SR_READ_ERR || rec_status == SR_READ_OVERFLOW) {
// Cannot Read Request(SR_READ_ERR) OR
// Request is bigger than(REQ_SIZE)
// In this case we return 400(BAD REQUEST)
rep_status = set_http_rep_400(rep_buff);
} else if (http_req_is_get(req_buff)) {
// Request is a valid GET
if (http_req_is_home(req_buff)) {
// The resource is "/" we return posts[0]
rep_status = set_http_rep_200(posts[0].content_type, posts[0].body,
strlen(posts[0].body) + 1, rep_buff);
} else {
// The resource is different than "/"
size_t p_idx;
set_http_req_res(req_buff, 5, http_req_res_buff);
if (set_post_idx(&p_idx, http_req_res_buff) < 0) {
// If the resource is not a number, or is a number
// out of range we return 404 NOT FOUND
rep_status = set_http_rep_404(rep_buff);
} else {
// We return the corresponding post based on the index
struct post_s post = posts[p_idx];
rep_status = set_http_rep_200(post.content_type, post.body,
strlen(post.body) + 1, rep_buff);
}
}
} else {
// The request looks valid but it 's not a get
// We return 501
rep_status = set_http_rep_501(rep_buff);
}
if (rep_status < 0) {
// There was an error constructing the response
// TODO LOG
} else {
server_send(client_sock_fd, rep_buff);
}
close(client_sock_fd);
exit(EXIT_SUCCESS);
}
We define three buffers:
rep_buff
- Here is where we keep the response we are sending back to the client;req_buff
- This contains the request coming from the client;http_req_res_buff
- Here we keep the resource (posts
index) we want to access. This is something we extract from req_buff
.The most important functions called from server_proc_req
are server_receive
and server_send
. These two methods read and write data to/from the socket.
The code is quite straight-forward:
enum server_receive_ret {
SR_CON_CLOSE = -1,
SR_READ_ERR = -2,
SR_READ_OVERFLOW = -3
};
static int server_receive(int client_sock_fd, char *req_buff) {
int b_req = 0;
int tot_b_req = 0;
while ((b_req = recv(client_sock_fd, &req_buff[tot_b_req],
REQ_SIZE - tot_b_req, 0)) > 0) {
/* Connection was closed by the peer */
if (b_req == 0) return SR_CON_CLOSE;
/* Reading Error */
if (b_req == -1) return SR_READ_ERR;
tot_b_req += b_req;
/* HTTP Requst is sent */
if (http_req_is_final(req_buff, tot_b_req)) break;
/* req_buff overflows */
if (tot_b_req >= REQ_SIZE) return SR_READ_OVERFLOW;
}
return tot_b_req;
}
enum server_send_errno { SS_ERROR = -1 };
static int server_send(int client_sock_fd, char *rep_buff) {
int w_rep = 0;
int tot_w_rep = 0;
size_t total = strlen(rep_buff) + 1;
while ((w_rep = send(client_sock_fd, rep_buff, total - tot_w_rep, 0)) > 0) {
if (w_rep < 0) return SS_ERROR;
tot_w_rep += w_rep;
}
return tot_w_rep;
}
The two methods (server_receive
and server_send
) are actual wrappers over send()
and recv()
that add additional checks on-top.
For example, in server_receive
we make sure we read bytes (b_req
) up until we encounter CRLF
(http_req_is_final
) or we overflow (we read more bytes than req_buff
can hold).
In server_send
we make sure that we send all the bytes from rep_buff
. Calling send
once doesn’t guarantee that; that’s why we do everything in a loop that checks how manys bytes we’ve sent (using w_rep
).
Lastly, the methods: set_http_rep_200
, set_http_rep_404
, set_http_rep_500
are all “overloaded” (if we can call them like this) for the set_http_rep_ret
method:
#define REP_FMT "%s%s\n"
#define REP_H_FMT "HTTP/%s %d \nContent-Type: %s\nContent-Length: %zu\n\n"
#define REP_H_FMT_LEN (strlen(REP_H_FMT) + 1 + (1 << 6))
#define REP_MAX_CNT_SIZE (1 << 19)
#define REP_MAX_SIZE (REP_H_FMT_LEN + REP_MAX_CNT_SIZE)
enum set_http_rep_ret {
SHR_ENC_ERROR = -1,
SHR_HEAD_OVERFLOW = -2,
SHR_CNT_ENC_EROR = -3,
SHR_CNT_OVERFLOW = -4
};
static int set_http_rep(const char *http_ver, const http_s_code s_code,
const char *cnt_type, const char *cnt,
const size_t cnt_size, char *rep_buff) {
char h_buff[REP_H_FMT_LEN] = {0};
int bw_head = snprintf(h_buff, REP_H_FMT_LEN, REP_H_FMT, http_ver, s_code,
cnt_type, cnt_size);
if (bw_head < 0)
return SHR_ENC_ERROR;
else if (bw_head >= REP_H_FMT_LEN)
return SHR_HEAD_OVERFLOW;
size_t buff_size = bw_head + cnt_size;
if (buff_size > REP_MAX_SIZE) return SHR_CNT_OVERFLOW;
int bw_rep = snprintf(rep_buff, buff_size, REP_FMT, h_buff, cnt);
if (bw_rep < 0) return SHR_CNT_ENC_EROR;
return bw_rep;
}
This method constructs the message we will send back to the browser and makes sure we don’t overflow. It starts by building the header response:
"HTTP/%s %d \nContent-Type: %s\nContent-Length: %zu\n\n"
And then adding the actual content:
#define REP_FMT "%s%s\n"
I’ve used snprintf()
for both string concatenations to check for possible overflows or encoding errors.
That’s all.
All in all, microblog.c
was an exciting experiment. The code is to be taken lightly: like a combination of software minimalism, poorly written C (waiting for feedback, actually), and a late April’s Fools Day joke.
The current article won’t be an exhaustive technical analysis on Gradle but more like a spontaneous rant.
Firstly, the time I am willing to allocate for learning a build tool will always be limited. Secondly, I try to be fair, so the list of things I expect from a build tool is small:
Jokes “inside”, I’ve just described Maven.
Unfortunately, Gradle lost its way between the upgrade from version 4.x
to 5.x
, or between the upgrade from version 5.x
to 6.x
, or between the upgrade from version 6.x
to 7.x
, or between 7.x
to 8.x
. Or maybe Gradle was never the way. We were momentarily happy not having to write and read (pom.)XML files ever again, and we jumped the boat too early. Our problem was never Maven, but XML…
The moment you exit the realm of straightforward build files, you will become lost and incredibly lonely. This will happen because you never had the patience (by then!) to read the documentation in its entirety. And bare in mind, Gradle’s documentation is not something you can skim on the weekend or from your mobile phone while sitting on the bus. On the contrary, it’s a “hard” read, full of specific technical jargon you need to familiarize yourself with.
So let me give you an example, the chapter “Learning the Basics -> Understanding the Build Lifecycle”, starts with the following paragraph. This should be an easy read, given it’s an introductory article:
Quickly, without opening your CS reference book, tell me what’s a Directed Acyclic Graph. It’s ok; you don’t have to open your CS reference book because the authors of the documentation were kind enough to link a Wikipedia article:
After digging up the documentation for a few days (or up to a week, if you want to understand how Multi-Project builds work), things will become much clearer as you’ll experience a few Eureka moments. This period is critical. After this week, you will either hate or love Gradle. It only depends on your overall tolerance to the complexity and over-engineered solutions.
In any case, kudos!; you are now part of a select group of people who managed to go through the Gradle documentation. But, it will be wise to hide this fact from your team. Otherwise, your colleagues will make you the guy responsible for the build file. This is a cross responsibility not always worth carrying.
My advice:
Java prides itself on being a conservative technology (in the lack of better wording). The standard API rarely changes, and the old stuff keeps working, even if it falls from grace. People are not using Vector<T>
anymore, but this doesn’t mean Vector<T>
was removed from the Standard Library. To a lesser extent, the ecosystem of libraries, frameworks, and tools surrounding and supporting Java inherits this approach. Developers make great efforts to maintain backward compatibility, even between major versions.
This is not the case with gradle. Incrementing to a new major version is always painful (for non-trivial builds):
Because of the API changes, 3rd party plugins you depended upon won’t work anymore unless the original authors are updating them. For this reason, the very popular plugin Gradle Shadow comes into 3 “flavors”, for each API change major Gradle version:
Maintaining multiple versions puts an unnecessary burden on the open-source maintainer. For example, the famous Log4shell exploit was never fixed in older versions of the Gradle Shadow plugin (see issue), forcing users to upgrade either update the Gradle Version, and by this provoking even more backward compatibility havoc, or implementing alternative solutions.
On the one hand, I love that the Gradle people are forward-thinking, but the way things change from version to version is sometimes too much. If you plan to use Gradle, you should permanently update it. If you get to the point of being a few (major) versions behind, you will make your life harder than it should.
My advice:
Please make no mistake; when you choose to use Gradle, you will program your build file, not configure it.
The build.gradle
is a running program in disguise (DSL). This means you can write (business) logic into a build file by creating your functions and hooking them into the Directed Acyclic Graph we were previously speaking of. So even if it’s not explicitly required, it’s time for you to learn a little bit of groovy or kotlin, depending on the Gradle dialect you pick.
As a fun exercise, let’s write a build.gradle
file that fails the build if the weather temperature in Bucharest is lower than 25 degrees (Celsius). We need to write a new task
called howIsTheWeatherInBucharest
, connect to the https://openweathermap.org/ API through a REST call, perform the check and fail the build if the day is too cold for programming.
// rest of the build file
task howIsTheWeatherInBucharest {
doLast {
// Quick and dirty code
def apiKey = '<...enter api code here...>'
def req = new URL('https://api.openweathermap.org/data/2.5/weather?q=Bucharest&units=metric&appid=' + apiKey).openConnection()
req.setRequestMethod("GET")
req.setRequestProperty("Content-Type", "application/json; charset=UTF-8")
req.setDoOutput(true)
def is = req.getInputStream();
def resp = new Scanner(is).useDelimiter("\\A").next();
def json = new groovy.json.JsonSlurper().parseText(resp)
def temp = Double.parseDouble(json.main.temp.toString())
if (temp < 25.0) {
throw new GradleException("Build file, the weather in Bucharest is bad")
}
}
}
compileJava.dependsOn(howIsTheWeatherInBucharest)
Having the ability to write code is seductive, but it opens Pandora’s Box. The programmer’s reflex to throw in some custom functions to make things work will kick in, especially if the build file is complex. And to be honest, writing your build file with a programmer’s mindset is more natural than trying to circumvent the DS.
But let’s take a step back, and ask ourselves if this is what we want from a build tool?! Writing quick and dirty code can spiral into writing more and more quick ‘n dirty code. Other people in your extended team can add their personal quick ‘n dirty code. Without the ability to properly debug the build process and the non-standard hacks people are willing to put in the build file, things can become less portable and extremely environment-dependent or simply not idempotent. Should you always be online to build your project? Should you be inside a Private Network?
Do you remember when people were creating .sh
scripts to build things? There’s a reason we’ve stopped doing so.
Even if it’s easy to do so, the Gradle build shouldn’t replace a proper CI/CD pipeline, but I’ve worked and seen projects where the build process was doing much more than assembling a fat jar. It ran tests and integrated directly with Sonarqube, created custom reports based on the static code analysis results, performed infrastructure changes, etc. Why!? Because it was possible.
My advice:
Gradle 5.0 came with a “game”-breaking change: devs were allowed to use an experimental Kotlin DSL to replace the historical groovy DSL. In theory, this was terrific news, especially for the Android folks who were already flocking to Kotlin. The reality, this fragmented the community and brought even more confusion to the uninitiated.
For example, searching Stackoverflow for Gradle issues suddenly became more arduous. First of all, the answers you will find on the Internet are most of the time outdated. You cannot simply hack your way with a solution targeting 5.0
if you are already at version 7.0
. Chances are it won’t work. But now you also need to be attentive to the dialect! You can find a working solution that uses Groovy, and you will have to translate it yourself (manually) to Kotlin.
Compared to the Groovy DSL, the Kotlin DSL seems to be more strict and more opinionated. After all, Kotlin has a stricter type system compared to groovy. So if you are a Java developer planning to use Gradle with the Kotlin DSL, you have to be familiar with Groovy (to be able to read the old materials), but you also need to learn enough Kotlin to be able to write your build file. A little bit of learning new things didn’t kill anyone, but I am asking again: Why is it necessary to learn a new programming language to master a build tool!?
My advice:
I’ve promised myself not to use Gradle anymore, but I still do it from time to time, especially for smaller, contained, personal projects.
]]>This article explains a straightforward approach for generating Perfect Hash Functions, and using them in tandem with a Map<K,V>
implementation called ReadOnlyMap<K,V>
. It assumes the reader is already familiar with the concepts like hash functions and hash tables. If you want to refresh your knowledge on the two mentioned topics, I recommend you to read some of my previous articles: Implementing Hash Tables in C and A tale of Java Hash Tables.
A Perfect Hash Function (PHF), \(H\) is a hash function that maps distinct elements from a set \(S\) to a range of integer values \([0,1,....]\), so that there are no collisions. In other words, \(H\) is injective. This means that for any \(x_{1}, x{2} \in U\), if \(H(x_{1})=H(x{2})\) we can say for sure \(x_{1}=x_{2}\). The contrapositive argument is also true, if \(H(x_{1}) \neq H(x_{2})\), then for sure \(x_{1} \neq x_{2}\).
Moreover, a Minimal Perfect Hash Function (MPHF) is a PHF \(H\) defined on a finite set \(S = \{a_0, a_1, ..., a_{m-1}\}\) with values in range of integers values \(\{0, 1, ..., m-1\}\) of size \(m\).
A function like this would be fantastic to use in the context of Hash Tables, wouldn’t it be so !?
In theory, without collisions, every element goes straight into an empty bucket without risking finding an intruder already settled. Or, in the case of Open Addressing Hash Tables, the element doesn’t have to probe the buckets to find its place in the Universe.
Another (significant) advantage of using an MPHF is the space consideration: we can don’t have to impose a load factor on the Hash Table, because we know there’s a perfect association (1:1) between elements and buckets. When using an MPHF, the load factor is 1.0
, compared to ~ 0.66-0.8
for Open Addressing Hash Tables or 0.7
for classic Separate Chaining implementations.
But don’t get too excited; there are a few gotchas:
The idea for generating PHFs and MPHFs is not new; it first appeared in 1984 in a paper called Storing a Sparse Table with O(1) Worst Case Access Time. A significant improvement was proposed in 2009 with the paper Hash, displace, and compress (which the current article is based upon).
Meanwhile, more algorithms have emerged, and most of them are already implemented and described in the cmph library. What is nice about cmph is that you can compile the lib as a standalone executable that allows you to generate MPHFs from the command line. Currently cmph supports the following algorithms: CHD, BDZ, BMZ, BRZ, CHM, FCH, CHM, and FCH, each with its own PROs and CONs, as explained on this page.
If you are passionate about this topic, you can also check Reini Urban’s Perfect-Hash for PHFs code generation, and his article on the topic.
In this article, we are going to implement in Java, a “naive” version of CHD, which is actually based on how Steve Hanov implemented it in python, in his article: “Throw away the keys: Easy, Minimal Perfect Hashing”, and this C implementation by William Ahern.
If you want to checkout the code associated with this project:
git clone git@github.com:nomemory/mphmap.git
To get a better understanding of what we are going to achieve by the end of this article, let’s suppose we have a Set \(S\) of 15 Roman Emperors keys:
Set<String> emperors =
Set.of("Augustus", "Tiberius", "Caligula",
"Claudius", "Nero", "Vespasian",
"Titus", "Dominitian", "Nerva",
"Trajan", "Hadrian", "Antonious Pius",
"Marcus Aurelius", "Lucius Verus", "Commodus");
We want to find a function \(H\) that evenly distributes each of the keys to 15 hash buckets in the range [0,1, .. 14]
.
If we use Java’s built-in String.hashCode()
on the keys, we will get a few collisions:
// Intiliaze buckets as an empty List<ArrayList<String>>
List<ArrayList<String>> buckets =
Stream.generate(() -> new ArrayList<String>()).limit(emperors.size()).toList();
// Distributing elements in the buckets
emperors.forEach(s -> {
// We apply & 0xfffffff, because, by default hashCode
// can return a negative value
int hash = (s.hashCode() & 0xfffffff) % buckets.size();
buckets.get(hash).add(s);
});
// Printing buckets contents
for (int i = 0; i < buckets.size(); i++) {
System.out.printf("bucket[%d]=%s\n", i, buckets.get(i));
}
Output:
bucket[0]=[Augustus]
bucket[1]=[]
bucket[2]=[Tiberius]
bucket[3]=[]
bucket[4]=[Lucius Verus]
bucket[5]=[]
bucket[6]=[]
bucket[7]=[]
bucket[8]=[Caligula]
bucket[9]=[Antonious Pius]
bucket[10]=[]
bucket[11]=[Dominitian]
bucket[12]=[Hadrian, Titus, Claudius]
bucket[13]=[Nerva, Marcus Aurelius]
bucket[14]=[Trajan, Vespasian, Commodus, Nero]
As you can see, the distribution is far from perfect; there are nine collisions in 15 buckets.
In this article, we will implement a class, PHF.java
that will be able to evenly distribute the 15 Roman emperors into their own personal buckets. This would be the right thing to do because it would’ve been unfair to have "Trajan"
sharing the same space with "Nero"
.
Set<String> emperors =
Set.of("Augustus", "Tiberius", "Caligula",
"Claudius", "Nero", "Vespasian",
"Titus", "Dominitian", "Nerva",
"Trajan", "Hadrian", "Antonious Pius",
"Marcus Aurelius", "Lucius Verus", "Commodus");
// Initializing our Minimal Perfect Hash Function we are going to build
// in this article
PHF phf = new PHF(1.0, 4, Integer.MAX_VALUE);
phf.build(emperors, String::getBytes);
// Puting elements in the buckets
final String[] buckets = new String[emperors.size()];
emperors.forEach(emperor -> buckets[phf.hash(emperor.getBytes())] = emperor);
// Printing the results
for (int i = 0; i < buckets.length; i++) {
System.out.printf("bucket[%d]=%s\n", i, buckets[i]);
}
Output:
bucket[0]=Titus
bucket[1]=Vespasian
bucket[2]=Claudius
bucket[3]=Marcus Aurelius
bucket[4]=Nero
bucket[5]=Nerva
bucket[6]=Caligula
bucket[7]=Commodus
bucket[8]=Augustus
bucket[9]=Tiberius
bucket[10]=Hadrian
bucket[11]=Lucius Verus
bucket[12]=Antonious Pius
bucket[13]=Trajan
bucket[14]=Dominitian
As you can see, there are no collisions now. Our function PHF.hash()
works “flawlessly” in this regard: each Emperor to its own bucket.
Don’t get too excited. Let’s “microbenchmark” how fast our new function is compared to the established Object.hashCode()
.
Being ten times slower than Object.hashCode()
is not what I intended to show you, but you get the main idea, PHF.hash()
will be slower. My implementation is not exactly the best, but even if you apply some heavy optimizations, you won’t be able to get significant improvements.
Now let’s see how PHF.java
is implemented.
We split our initial \(S\), the set which contains all the possible keys, into virtual “buckets” \(B_{i}\) buckets of size \(0 \le i \lt r\). To obtain those buckets, we use a first-level hash function \(g(x)\), so that \(B=\{ x \mid g(x)=i \}\).
We sort the \(B_{i}\) buckets in descending order (according to their size, \(\mid B_{i} \mid\)), keeping their initial index for later use. The main idea is to start with the problematic buckets first (the ones having the most collisions).
!.
We initialize an array \(T=[0, 1, 2, ..., m-1]\) with 0
elements. In our particular example, m=15
. In \(T\) we keep track of elements for which our PHF don’t have collisions.
We iterate over each bucket \(B_{i}\) with \(i\in\{0,1...,r-1\}\) in the order obtained at step 2.
, until \(\mid B_{i}\mid=1\) (the size of \(B_{i}\) is 1
)
For the remaining buckets with \(\mid B_i \mid=1\), we will look into the remaining empty slots in \(T\), and one by one, we will put all remaining elements. We store \(\sigma(i)=-\text{position}-1\).
The last part is to find a way of computing \(H(x)\), which is our PHF: \(H(x) = \left\{ \begin{array}{ll} \phi_{\sigma(g(x))}(x) & \mbox{if } \sigma(g(x)) > 0 \\ -σ(g(x))-1 & \mbox{if } \sigma(g(x))\leq 0 \end{array} \right.\)
Visually \(H(x)\) works like this:
Don’t worry if things don’t make a lot of sense now; the code is easier to implement than the actual presentation.
In practice we don’t need two separate functions for \(g(x)\) and \(\phi_l(x)\), we simply apply the following convention:
4.1
, when when we incremenent \(l\) for computing \(\phi_l(x)\) we start with \(l \in \{1,2,...\}\).From a code perspective, my function of choice was MurmurHash, using this Apache implementation:
// Constants for 32-bit variant
private static final int C1_32 = 0xcc9e2d51;
private static final int C2_32 = 0x1b873593;
private static final int R1_32 = 15;
private static final int R2_32 = 13;
private static final int M_32 = 5;
private static final int N_32 = 0xe6546b64;
public static int hash32x86(final byte[] data, final int offset, final int length, final int seed) {
int hash = seed;
final int nblocks = length >> 2;
// body
for (int i = 0; i < nblocks; i++) {
final int index = offset + (i << 2);
final int k = getLittleEndianInt(data, index);
hash = mix32(k, hash);
}
// tail
final int index = offset + (nblocks << 2);
int k1 = 0;
switch (offset + length - index) {
case 3:
k1 ^= (data[index + 2] & 0xff) << 16;
case 2:
k1 ^= (data[index + 1] & 0xff) << 8;
case 1:
k1 ^= (data[index] & 0xff);
// mix functions
k1 *= C1_32;
k1 = Integer.rotateLeft(k1, R1_32);
k1 *= C2_32;
hash ^= k1;
}
hash ^= length;
return fmix32(hash);
}
private static int mix32(int k, int hash) {
k *= C1_32;
k = Integer.rotateLeft(k, R1_32);
k *= C2_32;
hash ^= k;
return Integer.rotateLeft(hash, R2_32) * M_32 + N_32;
}
private static int fmix32(int hash) {
hash ^= (hash >>> 16);
hash *= 0x85ebca6b;
hash ^= (hash >>> 13);
hash *= 0xc2b2ae35;
hash ^= (hash >>> 16);
return hash;
}
In this regard:
MurmurHash32.hash32x86(data, 0, data.length, 0)
MurmurHash32.hash32x86(data, 0, data.length, l)
PHF.java
classWe start with the following constructor and class attributes:
protected double loadFactor;
protected int keysPerBucket;
protected int maxSeed;
protected int numBuckets;
public int[] seeds;
public PHF(double loadFactor, int keysPerBucket, int maxSeed) {
if (loadFactor>1.0) {
throw new IllegalArgumentException("Load factor should be <= 1.0");
}
this.loadFactor = loadFactor;
this.keysPerBucket = keysPerBucket;
this.maxSeed = maxSeed;
}
this.loadFactor
is by default 1.0
if we want a MPHF
or less than 1.0
if we want to generate a PHF
.
this.keysPerBucket
represents the average keys per bucket \(B_{i}\) . For example if in \(S\) we have 15 elements, and we pick keysPerBucket=3
, then we will have 5 buckets: \(B_{0}, B_{1}, B_{2}, B_{3}, B_{4}\), each with an average of 3 elements.
this.maxSeed
is the maximum value of \(l\). If for example we cannot find \(K_{i}\) for \(l \lt \text{maxSeed}\), then we say our search for finding the MPHF failed and we throw an exception.
this.seeds
corresponds to \(\sigma(i)\) from the algorithm description.
The next step is to define two hash functions:
One for internal use (a wrapper) over MurmurHash3.hash32x86(...)
;
The actual \(H(x)\).
public int hash(byte[] obj) {
// g(x)
int seed = internalHash(obj, INIT_SEED) % seeds.length;
// if σ(g(x)) <= 0
if (seeds[seed]<=0) {
// we return -σ(g(x))-1
return -seeds[seed]-1;
}
// we return ϕ_σ(g(x))(x)
int finalHash = internalHash(obj, seeds[seed]) % this.numBuckets;
return finalHash;
}
// IF val == 0 => g(x)
// ELSE acts like ϕ(x)
protected static int internalHash(byte[] obj, int val) {
return MurmurHash3.hash32x86(obj, 0, obj.length, val) & SIGN_MASK;
}
At this point, we introduce a class called: PHFBucket
. The sole purpose of this class, is to store the original index of the bucket after step 2.
:
private static class PHFBucket implements Comparable<PHFBucket> {
ArrayList<byte[]> elements;
int originalBucketIndex; // stores the original index
static PHFBucket from(ArrayList<byte[]> bucket, int originalIndex) {
PHFBucket result = new PHFBucket();
result.elements = bucket;
result.originalBucketIndex = originalIndex;
return result;
}
// Written in a way so we can sort in reverse order
@Override
public int compareTo(PHFBucket o) {
return o.elements.size() - this.elements.size();
}
@Override
public String toString() {
return "Bucket{" +
"elements.size=" + elements.size() +
", originalBucketIndex=" + originalBucketIndex +
'}';
}
}
The last step is to define the actual algorithm. At the end, this.seeds[]
will contain all the values necesarry to compute the MPHF
(or PHF
).
public <T> void build(Set<T> inputElements, Function<T, byte[]> objToByteArrayMapper) {
int seedsLength = inputElements.size() / keysPerBucket; // m
int numBuckets = (int) (inputElements.size() / loadFactor); // r
this.numBuckets = numBuckets;
// The seeds have to be calculated
// From an algorithm perspective this is σ(i)
this.seeds = new int[seedsLength];
// Fill the buckets with empty values initially
ArrayList<byte[]> buckets[] = new ArrayList[seedsLength];
for (int i = 0; i < buckets.length; i++) {
buckets[i] = new ArrayList<>();
}
// Adding elements to buckets
// (step 1. from the algorithm)
inputElements.stream().map(objToByteArrayMapper).forEach(el -> {
int index = (internalHash(el, INIT_SEED) % seedsLength);
buckets[index].add(el);
});
// Sorting so we can start with buckets with the most items
// (step 2. from the algorithm)
ArrayList<PHFBucket> sortedBuckets = new ArrayList<>();
for (int i = 0; i < buckets.length; i++) {
sortedBuckets.add(PHFBucket.from(buckets[i], i));
}
sort(sortedBuckets);
// For each bucket we try to find a function for which the seed has no collisions
// occupied represents T
// (step3)
BitSet occupied = new BitSet(numBuckets);
int sortedBucketIdx = 0;
PHFBucket bucket;
Integer originalIndex;
ArrayList<byte[]> bucketElements;
Set<Integer> occupiedBucket;
// (step 4.)
for(; sortedBucketIdx < sortedBuckets.size(); sortedBucketIdx++) {
bucket = sortedBuckets.get(sortedBucketIdx);
originalIndex = bucket.originalBucketIndex;
bucketElements = bucket.elements;
// If the buckets start to have a single element we don't have
// to do any additional computation, we can break the loop
if (bucketElements.size()==1) {
break;
}
// For each seed
int seedTry = INIT_SEED + 1;
for (; seedTry < maxSeed; seedTry++) {
occupiedBucket = new HashSet<>();
// For each element in the bucket
int eIdx = 0;
for (; eIdx < bucketElements.size(); eIdx++) {
int hash = internalHash(bucketElements.get(eIdx), seedTry) % numBuckets;
if (occupied.get(hash) || occupiedBucket.contains(hash)) {
// Trying with this seed is not successful, we break the loop
// So we can try with another seed
break;
}
occupiedBucket.add(hash);
}
if (eIdx == bucketElements.size()) {
// In thise case elements per bucket displace well,
// we can add them to occupied and the seed to 'seeds'
occupiedBucket.forEach(occupied::set);
this.seeds[originalIndex] = seedTry;
break;
}
}
// If the seed == SEED_MAX then we've failed constructing a Perfect Hash Function
// This means we've exhausted the possible seeds
if (seedTry==maxSeed) {
throw new IllegalStateException("Cannot construct perfect hash function");
}
}
// At this point only the buckets with one element remain, we need to add them
// to seed, we continue the iteration
// (step 5.)
int occupiedIdx = 0; // start from the first position
for(; sortedBucketIdx < sortedBuckets.size(); sortedBucketIdx++) {
bucket = sortedBuckets.get(sortedBucketIdx);
originalIndex = bucket.originalBucketIndex;
bucketElements = bucket.elements;
if (bucketElements.size()==0) {
break;
}
while(occupied.get(occupiedIdx)) {
// increase position so we can find an empty slot
occupiedIdx++;
}
occupied.set(occupiedIdx);
// We subtract (-1) to cover 0 cases
this.seeds[originalIndex] = -(occupiedIdx)-1;
}
}
Map<K,V>
Now that we have PHF.java
, we can create a Hash Table calledReadOnlyMap<K,V>
, that only allows read operations:
public class ReadOnlyMap<K,V> {
protected static final double LOAD_FACTOR = 1.0;
protected static final int KEYS_PER_BUCKET = 1;
protected static final int MAX_SEED = Integer.MAX_VALUE;
private PHF phf;
private ArrayList<V> values;
private Function<K,byte[]> mapper;
public static final <K,V> ReadOnlyMap<K,V> snapshot(Map<K, V> map, Function<K, byte[]> mapper, double loadFactor, int keysPerBucket, int maxSeed) {
ReadOnlyMap<K,V> result = new ReadOnlyMap<>();
result.phf = new PHF(loadFactor, keysPerBucket, maxSeed);
result.phf.build(map.keySet(), mapper);
result.values = new ArrayList<>(map.keySet().size());
for (int i = 0; i < map.keySet().size(); i++) {
result.values.add(null);
}
result.mapper = mapper;
map.forEach((k, v) -> {
int hash = result.phf.hash(mapper.apply(k));
result.values.set(hash, v);
});
return result;
}
public static final <K,V> ReadOnlyMap<K,V> snapshot(Map<K,V> map, Function<K, byte[]> mapper) {
return snapshot(map, mapper, LOAD_FACTOR, KEYS_PER_BUCKET, MAX_SEED);
}
public static final <V> ReadOnlyMap<String, V> snapshot(Map<String, V> map) {
return snapshot(map, String::getBytes);
}
public V get(K key) {
int hash = phf.hash(mapper.apply(key));
return values.get(hash);
}
}
Using the ReadOnlyMap<K,V>
is quite straightforward. We only have one method to create a ReadOnlyMap<K,V>
as a snapshot from a “normal” read-write Map<K,V>
:
public static void main(String[] args) {
Set<String> emperors =
Set.of("Augustus", "Tiberius", "Caligula",
"Claudius", "Nero", "Vespasian",
"Titus", "Dominitian", "Nerva",
"Trajan", "Hadrian", "Antonious Pius",
"Marcus Aurelius", "Lucius Verus", "Commodus");
// Creates a "normal map" from the given keys
final Map<String, String> mp = new HashMap<>();
emperors.forEach(emp -> {
mp.put(emp, emp+"123");
});
// Creates a "read-only map" from the previous map
final ReadOnlyMap<String, String> romp = ReadOnlyMap.snapshot(mp);
emperors.forEach(emp -> {
System.out.println(emp + ":" + romp.get(emp));
});
}
HashMap<K,V>
vs. ReadOnlyMap<K,V>
We’ve already benchmarked how “fast” is PHF.hash()
in the beggining in the article, and we’ve seen things can be up to 10 times slower than Object.hashCode()
. But let’s see how fast is ReadOnlyMap<K,V>
vs HashMap<K,V>
.
In this regard I’ve written the following benchmark, the tests the performance of the get
operation on a Maps of 20_000_000 keys:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Fork(value = 3, jvmArgs = {"-Xms6G", "-Xmx16G"})
@Warmup(iterations = 3, time = 10)
@Measurement(iterations = 5, time = 10)
public class TestReads {
@Param({"1000", "100000", "1000000", "10000000", "20000000"})
private int size;
private Map<String, String> map;
private ReadOnlyMap<String, String> readOnlyMap;
private MockUnitString stringsGenerator = words().map(s -> s + ints().get()).mapToString();
private List<String> keys;
@Setup(Level.Trial)
public void initMaps() {
keys = stringsGenerator.list(size).get();
this.map = new HashMap<>();
keys.forEach(key -> map.put(key, "abc"));
this.readOnlyMap = ReadOnlyMap.snapshot(map);
}
@Benchmark
public void testGetInMap(Blackhole bh) {
bh.consume(map.get(From.from(keys).get()));
}
@Benchmark
public void testGetInReadOnlyMap(Blackhole bh) {
bh.consume(readOnlyMap.get(From.from(keys).get()));
}
}
The results were quite interesting:
Even if the hash function is slower, given the higher memory locality, ReadOnlyMap
performs a little bit faster, than a normal HashMap<K,V>
.
Later Edit:
The initial benchmark performed was incorrect, so I’ve reported erroneous results: the reality is that HashMap<K,V>
faster.
If you just want to jump directly into the code, checkout this repository.
As Wikipedia states, Bloom Filters are space-efficient, probabilistic data structures, conceived by Burton Howard Bloom in 1970, used to test whether an element is a member of a set or not. What I find peculiar is that the real Mr. Howard Burton Bloom doesn’t have a wiki page, while the imaginary Mr. Leopold Bloom has one.
To make it short, a Bloom Filter is a data structure where we “add” elements. But after the addition, we cannot recover them further. They are chopped and hashed into pieces, and only a tiny footprint of what they once were remains. Afterward, we can ask the filter a delicate question:
[RandomDeveloper]: Is the element E
in the set U
or not?
The two possible answers a Bloom Filter can give us are:
[BloomFilter]: I am 100% sure the element E
is not in U
;
Or:
[BloomFilter]: I am almost sure the element E
is in U
, but I cannot guarantee you that…
For most of the non-critical scenarios you can think of, even the second answer is satisfactory in the light of how little space a Bloom Filter occupies. For example, you can check Prof. Michael Mitzenmacher ‘s presentation “Bloom Filters, Cuckoo Hashing, Cuckoo Filters, Adaptive Cuckoo Filters, and Learned Bloom Filters”, where he describes the ancient use of Bloom Filters: spellcheckers.
So, once upon a time, when memory was scarce, one of the first spellcheckers was based on a Bloom Filter who occupied 25KB
to determine if 210KB
of English words had possible spelling mistakes. Even if things have evolved since then, and spellcheckers don’t (necessarily or exclusively) use Bloom Filters anymore, the numbers stand impressive. 25Kb
for a killer word editor feature in the early 90s is not that bad.
But, Bloom Filters were not part of the undergrad curricula I’ve taken while studying at the University, and I’ve rarely seen them (directly) used in practice. Even the more experienced engineers tend to ignore their properties and advantages and wrongly (or rather unjustly) replace them with Hash Tables or variously Set implementations. Again, most programming languages don’t have them implemented in their standard libraries.
On the brighter note, they still fit well in various back-end architectures where we need to implement blacklists, or as parts of complex caching systems. For a production implementation check RedisBloom.
In any case, what is particularly interesting about Bloom Filters is that even the most straightforward “book implementation” works decently. Compared, for example, to the extreme arts of implementing an efficient Hash Table, writing yourself an efficient Bloom Filter is more approachable. The knowledge of writing one can fit inside our heads without checking a reference book on Data Structures and Algorithms.
The best way to understand how Bloom Filters are working is to “visualize” them:
\(E_{1},\ E_{2},\ E_{3},\ \text{...}\) and so on are the elements we want to add to the Bloom Filter. They can be anything, and as many of them are necessary, given there’s enough memory available.
\(h_{1}(x),\ h_{2}(x),\ h_{3}(x),\ h_{4}(x),\ \text{...}\) and so on are (non-cryptographic) hash functions. A Bloom Filter can have as many hash functions associated with it as you want, but usually, in practice, software developers pick a number between 4 and 8. In the section called Interesting ideas to explore, you will see why it’s not necessarily needed to have separate hash functions. One or two are enough to generate the others.
For practical reasons, let’s suppose our functions return uint32_t
values, which are natural numbers between \(0\) and \(2^{32}-1\).
Internally a Bloom Filter has an array-like memory zone associated where elements can have only two values: 0
and 1
. It’s up to us how we organize this area, but the classical implementation uses a Bit Vector (described later in the article).
Given an element \(E\), to insert it in the Bloom Filter we perform the following actions:
uint32_t
numbers, so we apply % bloom_filter_size
(modulo) to find the actual cells’ positions inside the array. We set those positions to 1
.To check if an element \(E\) is not in a Bloom Filter, we compute the hash values, and we test the bits again if they are all 1s
or not.
So, if our Bloom Filter uses 4
Hash Functions, then for our element E
, 4
bits in the array will be 1. If an element is already set to 1
we don’t perform any change.
Of course, hash collisions can happen - when two distinct elements \(E_{1}\) and \(E_{2}\) can have the same hash value.
If the Bit Vector size is too small, the values 1
can also start to overlap with each other, creating what we call false positives.
If you are curious about the math behind it and how to calculate the probability of false positives, the Wikipedia article is quite good:
\(\varepsilon\) is the false positive rate, and to keep it under control, we can fine-tune the actual values of k
, n
, and m
.
The full code of this article can be found in this github repo:
git clone git@github.com:nomemory/bloomfilters-c.git
We assume that for our platform 1 byte is always 8 bits.
Before jumping directly into implementing a Bloom Filter we need to implement first an auxiliary data structure called a Bit Vector (also known as a Bit Array).
A Bit Vector is an array data structure that compactly stores bits
, meaning that all the elements from this particular array are either 0
and 1
. But in the C programming language, there is no data type as small as 1 bit
. The smallest you can get is char
which is 8 bits
, or uint8_t
(and its signed counterpart int8_t
) which is also 8 bits
. It will be a shame to use that much space to store either 1
or 0
in 8 bits
, when you actually need only one.
Initially, I wanted to use a xtrapbits.h
, a bulletproof macro implementation that has worked since 1987, simply put, a piece of code that’s about my age. Eventually, I’ve opted to write my straightforward implementation, which you can find in bitutil.c
and bitutil.h
, stripping down functionality to exactly what I need in the context of Bloom Filters.
Unfortunately, C doesn’t provide support at the language level for Bit Vectors, but fortunately, through the use of pointers and bitwise operations, we can implement them ourselves.
Our Bit Vector will be composed of small uint32_t
chunks of memory, but each of those chunks will contain 32 possible values of 1s
and 0s
.
typedef struct bit_vect_s {
uint32_t *mem;
size_t size; // The number of bits
} bit_vect;
From a memory perspective things will look like this:
So in n
chunks of uint32_t
we will be able to store n * sizeof(uint32_t) * 8
elements (1s
or 0s
).
If n=2
, we would be able to store 2 * 4 * 8 = 64
bits.
If n=3
, we would be able to store 3 * 4 * 8 = 96
bits.
… and so on
The method to allocate memory for a new Bit Vector has the following signature, where num_bits
represents the (exact) number of bits our Bit Vector will contain:
bit_vect *bit_vect_new(size_t num_bits) ...
Implementing it is relatively straightforward; the only “math” involved is to find out how many uint32_t
chunks we need to allocate, as the input parameter num_bits
is not necessarily a multiple of a sizeof(uint32_t)*8
.
For example, if num_bits=71
, we need at least 3 uint32_t
chunks: 2 of them will be fully utilized, and from the third one, we will use only 7
bits out of the 32
available, a compromise in wasting resources we can live with.
To put this into code, we will start by declaring the following macros:
#define BITS_IN_BYTE 8
#define BITS_IN_TYPE(type) (BITS_IN_BYTE * (sizeof(type)))
BITS_IN_BYTE
On most modern systems, we expect the number to be 8, but it’s better to isolate this in a constant than to have magic numbers in our code.BITS_IN_TYPE
is a function macro that computes the number of bits associated with a type. If we call BITS_IN_TYPE(uint32_t)
, the result will probably be 32
on most modern systems.Later Edit, user nimaje on gitbub mentioned that in
limits.h
there’s already a constantCHAR_BIT
that holds the number of bits of a char. Instead of defining your ownBITS_IN_BYTE
constant, it’s a better idea to use this one.
Then we write the actual memory allocator, bit_vect_new
:
bit_vect *bit_vect_new(size_t num_bits) {
bit_vect *vect = malloc(sizeof(*vect));
if (NULL==vect) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
size_t mem_size = num_bits / BITS_IN_TYPE(uint32_t);
// If num_bits is not a multiplier of BITS_IN_TYPE(uint32_t)
// We add one more chunk that will be partial occupied
if (!(num_bits%BITS_IN_TYPE(u_int32_t))) {
mem_size++;
}
vect->mem = calloc(mem_size, sizeof(*(vect->mem)));
if (NULL==vect->mem) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
vect->size = num_bits;
return vect;
}
Freeing the memory for a bit_vect
is straightforward and is done in two steps, in this exact order:
bit_vect->mem
first;bit_vect
second;void bit_vect_free(bit_vect *vect){
free(vect->mem);
free(vect);
}
nth
bitThe function that gets the value of the nth
bit from our Bit Vector looks like this:
bool bit_vect_get(bit_vect *vect, size_t bit_idx) {
if (bit_idx>=vect->size) {
fprintf(stderr, "Out of bounds bit_idx=%zu, vect->size=%zu\n",
bit_idx, vect->size);
exit(EXIT_FAILURE);
}
size_t chunk_offset = bit_idx / BITS_IN_TYPE(uint32_t);
size_t bit_offset = bit_idx & (BITS_IN_TYPE(uint32_t)-1);
uint32_t byte = vect->mem[chunk_offset];
return (byte>>bit_offset) & 1;
}
Firstly, it performs some sanity checks on the input (bit_idx
) to see if nth
bit we are looking is in between our memory bounds.
Secondly, we need to compute two offsets to determine where the memory location (the bit) we are looking for is situated:
chunk_offset
is the uint32_t
frame that holds the bit.bit_offset
is the actual position of the bit inside the uint32_t
chunkFor example, if we are searching for bit_idx=60
, then: chunk_offset = 60 / 32 = 1
, and bit_offset = 60 % 32 = 28
.
This line bit_idx & (BITS_IN_TYPE(uint32_t)-1)
can be a little confusing because you would typically expect the %
operation to get the remainder, instead of &(32-1)
. This simple bitwise trick makes the codes more efficient by reducing the number of operations on the CPU. It only works if the divisor is a power of two (lucky us). I’ve already explained it here, using this diagram.
The last line, (byte>>bit_offset) & 1
, extracts the actual value of the bit (by shifting the uint32_t
with the bit_offset
).
nth
bitThe corresponding code for this operation is the following:
void bit_vect_set(bit_vect *vect, size_t bit_idx, bool val) {
if (bit_idx>=vect->size) {
fprintf(stderr, "Out of bounds bit_idx=%zu, vect->size=%zu\n",
bit_idx, vect->size);
exit(EXIT_FAILURE);
}
size_t chunk_offset = bit_idx / BITS_IN_TYPE(uint32_t);
size_t bit_offset = bit_idx & (BITS_IN_TYPE(uint32_t)-1);
uint32_t *byte = &(vect->mem[chunk_offset]);
if (val) {
// Sets the the `bit_idx` to 1 (true)
*byte |= ((uint32_t)1) << bit_offset;
}else {
// Sets the bit `bit_idx` to 0 (false)
*byte &= ~(1 << bit_offset);
}
}
Yet again, we need to compute we compute the chunk_offset
and bit_offset
, and then we branch off:
*byte |= ((uint32_t)1) << bit_offset
sets the corresponding bit to 1;*byte &= ~(1 << bit_offset)
sets the correponding bit to 0;Just like Hash Tables (previously explained in this article, Bloom Filters make heavy use of Hash Functions. It’s not in the purpose of this article to explain how Hash Functions work, as I’ve already did my best here.
My rule of thumb is to use the sdmm
and djb2
functions in toy implementations, and something more advanced like MurmurHash, FNV or spookyhash for serious stuff. Recently I’ve also played with chunky64, and the results good, still I am not sure how popular is this in the real world.
But as I said, the two hash functions I will gonna be using in this article are sdbm
and djb2
. They are extremely simple, and they work decent enough for our purpose:
uint32_t djb2(const void *buff, size_t length) {
uint32_t hash = DJB2_INIT;
const uint8_t *data = buff;
for(size_t i = 0; i < length; i++) {
hash = ((hash << 5) + hash) + data[i];
}
return hash;
}
uint32_t sdbm(const void *buff, size_t length) {
uint32_t hash = 0;
const uint8_t *data = buff;
for(size_t i = 0; i < length; i++) {
hash = data[i] + (hash << 6) + (hash << 16) - hash;
}
return hash;
}
The only improvement I would make on these two would be to increase the data reading frame size from uint8_t
to something bigger, reducing the overall number of >>
operations.
Next, we need to make sure is that our hash functions share the same signature, so we can typedef
them for further use:
typedef uint32_t (*hash32_func)(const void *data, size_t length);
The main idea is that our Bloom Filter will receive several hash functions in the initialization phase, so doing an alias on the type (by using typedef
) will make our code more readable when we pass the function pointers.
The Bloom’s Filter interface, looks like this:
typedef uint32_t (*hash32_func)(const void *data, size_t length);
typedef struct bloom_filter_s {
bit_vect *vect;
hash32_func *hash_functions;
size_t num_functions;
size_t num_items;
} bloom_filter;
bloom_filter *bloom_filter_new(size_t size, size_t num_functions, ...);
bloom_filter *bloom_filter_new_default(size_t size);
void bloom_filter_free(bloom_filter *filter);
void bloom_filter_put(bloom_filter *filter, const void *data, size_t length);
void bloom_filter_put_str(bloom_filter *filter, const char *str);
bool bloom_filter_test(bloom_filter *filter, const void *data, size_t lentgth);
bool bloom_filter_test_str(bloom_filter *filter, const char *str);
It’s nothing fancy, just a simple API:
struct bloom_filter_s
defines the internal data-structure:
bit_vect
where we keep our 1s
and 0s
;num_items
the actual number of bits the Bit Vector has;hash_functions
and num_functions
for keeping track of the internal hash functions;The code for allocating dynamic memory for our newly defined struct (bloom_filter
) is the following:
bloom_filter *bloom_filter_new(size_t size, size_t num_functions, ...) {
va_list argp;
bloom_filter *filter = malloc(sizeof(*filter));
if (NULL==filter) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
filter->num_items = 0;
filter->vect = bit_vect_new(size);
filter->num_functions = num_functions;
filter->hash_functions = malloc(sizeof(hash32_func)*num_functions);
if (NULL==filter->hash_functions) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
va_start(argp, num_functions);
for(int i = 0; i < num_functions; i++) {
filter->hash_functions[i] = va_arg(argp, hash32_func);
}
va_end(argp);
return filter;
}
bloom_filter *bloom_filter_new_default(size_t size) {
return bloom_filter_new(size, 2, djb2, sdbm);
}
void bloom_filter_free(bloom_filter *filter) {
bit_vect_free(filter->vect);
free(filter->hash_functions);
free(filter);
}
For an inexperienced eye, the only thing that might look confusing is the ...
notation from the bloom_filter_new
method signature. bloom_filter_new
is basically a variadic function, that accepts an arbitrary number of params num_functions
. Those params are actual hash functions (hash32_func
).
The code that is adding an element to the Bloom Filter is the following:
hash_functions
;1
the correct valuesvoid bloom_filter_put(bloom_filter *filter, const void *data, size_t length){
for(int i = 0; i < filter->num_functions; i++) {
uint32_t cur_hash = filter->hash_functions[i](data, length);
bit_vect_set1(filter->vect, cur_hash % filter->vect->size);
}
// We've just added a new item, we incremenet the value
filter->num_items++;
}
C doesn’t support polymorphism, so to make it simple for adding a string (char*
) to the filter, we simply write a helper function:
void bloom_filter_put_str(bloom_filter *filter, const char *str) {
bloom_filter_put(filter, str, strlen(str));
}
The code for Testing if an element exists in the Bloom Filter is the following:
hash_functions
;bit_vect
.bool bloom_filter_test(bloom_filter *filter, const void *data, size_t lentgth) {
for(int i = 0; i < filter->num_functions; i++) {
uint32_t cur_hash = filter->hash_functions[i](data, lentgth);
if (!bit_vect_get(filter->vect, cur_hash % filter->vect->size)) {
return false;
}
}
return true;
}
bool bloom_filter_test_str(bloom_filter *filter, const char *str) {
return bloom_filter_test(filter, str, strlen(str));
}
#include <stdio.h>
#include <stdlib.h>
#include "bloom.h"
int main(int argc, char *argv[]) {
bloom_filter *filter = bloom_filter_new_default(1024);
bloom_filter_put_str(filter, "abc");
printf("%d\n", bloom_filter_test_str(filter, "abc"));
printf("%d\n", bloom_filter_test_str(filter, "bcd"));
printf("%d\n", bloom_filter_test_str(filter, "0"));
printf("%d\n", bloom_filter_test_str(filter, "1"));
bloom_filter_put_str(filter, "2");
printf("%d\n", bloom_filter_test_str(filter, "2"));
return 0;
}
Output:
1
0
0
0
1
Following a discussion on reddit, Chris Wellons suggested the fact that Bloom Filters don’t need actual unique k
functions. We can only have one that generates a hash value, and from that value, through permutations we can generate as many new hash values as we want:
uint64_t hash64(void *buf, size_t len);
uint64_t permute64(uint64_t, uint64_t key);
uint64_t bufhash = hash64(buf, len);
uint32_t hashes[K];
for (int i = 0; i < K; i++) {
hashes[i] = permute64(bufhash, i);
}
Where permute64
can look like:
uint64_t permute64(uint64_t x, uint64_t key)
{
x += key;
x ^= x >> 30;
x *= 0xbf58476d1ce4e5b9U;
x ^= x >> 27;
x *= 0x94d049bb133111ebU;
x ^= x >> 31;
}
Another interesting idea for avoiding to use multiple (separate) hash functions comes from Building a Better Bloom Filte. The authors suggest that two hash functions \(h_{1}(x)\), and \(h_{2}(x)\) are enough to generate others in the form \(g_{i}(x)\):
If you want to read more about how to implement Bloom Filters in the C language, you can check this article: “How to write a better Bloom Filter” by Drew DeVault.
Another interesting C implementation is bloomd which is a Network Daemon for Bloom Filters written by Armon Dadgar, and even if the project doesn’t seem to be maintained anymore, it is quite an exciting piece of software to look at.
Other references:
A proposed efficient alternative to Bloom Filters the Cuckoo Filters, but before speaking, I need to do my homework first.
]]>