Skip to main content

Latin1 vs UTF8

Latin1 was the early default character set for encoding documents delivered via HTTP for MIME types beginning with /text . Today, only around only 1.1% of websites on the internet use the encoding, along with some older applications. However, it is still the most popular single-byte character encoding scheme in use today. A funny thing about Latin1 encoding is that it maps every byte from 0 to 255 to a valid character. This means that literally any sequence of bytes can be interpreted as a valid string. The main drawback is that it only supports characters from Western European languages. The same is not true for UTF8. Unlike Latin1, UTF8 supports a vastly broader range of characters from different languages and scripts. But as a consequence, not every byte sequence is valid. This fact is due to UTF8's added complexity, using multi-byte sequences for characters beyond the general ASCII range. This is also why you can't just throw any sequence of bytes at it and ex...

Binary, IPv4, and Subnets

The IPv4 protocol which we broadly (but not totally) use today rests on an addressing system that was designed in the 1970s and formally published in 1980. It uses quad dotted 32 bit addresses, which can be written in various notations, and a straightforward subnetting scheme.

Binary

For example, the address 1.0.0.1, which belongs to CloudFlare, is constituted by four separate octets, 1, 0, 0, and 1. We can convert these numbers into alternative representations like binary, which is a number N base-2, N2. And octets here are simply integers ranging from 0 to 255. Each digit in the binary number represents a power of 2, starting from the rightmost digit, the least significant bit, and then you multiply it by either 0 or 1. For example!

0 = 0 = (2^0 * 0)
1 = 1 = (2^0 * 1)
2 = 10 = (2^1 * 1) + (2^0 * 0)
3 = 11 = (2^1 * 1) + (2^0 * 1)
4 = 100 = (2^2 * 1) + (2^1 * 0) + (2^0 * 0)
5 = 101 = (2^2 * 1) + (2^1 * 0) + (2^0 * 1)
6 = 110 = (2^2 * 1) + (2^1 * 1) + (2^0 * 0)
7 = 111 = (2^2 * 1) + (2^1 * 1) + (2^0 * 1)
8 = 1000 = (2^3 * 1) + (2^2 * 0) + (2^1 * 0) + (2^0 * 0)
9 = 1001 = (2^3 * 1) + (2^2 * 0) + (2^1 * 0) + (2^0 * 1)

Decimal to Binary

To find the binary of a number like 3, we repeatedly divide by 2 and note the remainder each time. First, 3 ÷ 2 gives a remainder of 1, then 1 ÷ 2 gives another remainder of 1. Reading the remainders from bottom to top, we get "11" as the binary representation of 3.

Binary to Decimal

But how do we convert a binary representation of three back to decimal form? The number three is "11" in binary representation.

So we calculate by powers of 2 from our left most significant bit N, to the right least significant bit, which is 0, e.g. from 2^N to 2^0. And each time, we multiply this value by the binary value, 0 or 1, of each individual digit, like this. Then we take the sum of this sequence. For example, for the binary representation of three, which is "11":

(2^1 * 1) + (2^0 * 1) = 2 + 1 = 3

And for the binary representation of two, which is "10", we do this:

(2^1 * 1) + (2^0 * 0) = 2 + 0 = 2

Altogether, we expand this notion and use the 8 bit standard to construct a byte representation our possible octet values from 0 to 255. And we write them like this:

0 = 00000000 
1 = 00000001
2 = 00000010
..
254 = 11111110
255 = 11111111

32-bit addressing

Why does this matter? Well, it's applicable to almost anything involving networks. For example, the first digit in CloudFlare's DNS IPv4 address, 1.0.0.1, is represented by 12. And the second digit, 0, by 02. And so on. This format follows for any other numbers in an IP address. And altogether, we can rewrite the address 1.0.0.1 in binary like so:

00000001.00000000.00000000.00000001

We say this binary represents an address within the 32-bit IPv4 addressing system. That is, each IP address is composed of four 8-bit octets. And every address has 32 bits. And each bit can be 1 or 0.

Therefore, we say there are 232 possible IPv4 addresses, or approximately 4.3 billion possible unique addresses.

IPv4 Subnetting

Subnets are a strategy for dividing IP addresses into subnetwork blocks. This involves what we call "subnet masks" which are used to delineate different classes of subnets.

Each subnet mask corresponds to a number of bits assigned for network addressing, with the rest of the bits left for hosts. Subnet classes are as follows:

Class A (0-127): For large organizations. The first byte is used for the network, and the last three bytes are for hosts. E.g. 10.0.0.0 to 10.255.255.255.

Class B (128-191): For medium-sized networks. The first two bytes are for the network, and the last two are for hosts. E.g. 152.93.0.0 to 152.93.255.255.

Class C (192-223): For small networks. The first three bytes define the network, and the last byte is for hosts. E.g. 200.10.10.0 to 200.10.10.255.

Class D (224-239): Used for multicast, not for general public use.

Class E (240-255): Reserved for research, not used in the public sector.

But to get a better understanding and visualize it, each IP range delineates an address or range of addresses.

For example, 10.0.0.1/32 refers to a single machine. While 0.0.0.0/0 refers to the entire address space of possible IP4 addresses. A subnet mask is a bitmask that denotes how many bits are used for networking and which are for hosting.

/0 - 0.0.0.0 
/1 - 128.0.0.0
/2 - 192.0.0.0
/3 - 224.0.0.0
/4 - 240.0.0.0
/5 - 248.0.0.0
/6 - 252.0.0.0
/7 - 254.0.0.0
/8 - 255.0.0.0 (Class A)
/9 - 255.128.0.0
/10 - 255.192.0.0
/11 - 255.224.0.0
/12 - 255.240.0.0
/13 - 255.248.0.0
/14 - 255.252.0.0
/15 - 255.254.0.0
/16 - 255.255.0.0 (Class B)
/17 - 255.255.128.0
/18 - 255.255.192.0
/19 - 255.255.224.0
/20 - 255.255.240.0
/21 - 255.255.248.0
/22 - 255.255.252.0
/23 - 255.255.254.0
/24 - 255.255.255.0 (Class C)
/25 - 255.255.255.128
/26 - 255.255.255.192
/27 - 255.255.255.224
/28 - 255.255.255.240
/29 - 255.255.255.248
/30 - 255.255.255.252
/31 - 255.255.255.254 (Used for point-to-point links)
/32 - 255.255.255.255 (Single IP address)

In practical terms, a /24 network has 254 usable IP addresses for devices, while a /30 network has only 2 usable IP addresses, with the remaining two being reserved for the network and broadcast addresses.

For example, the subnet mask for the /30 block is 255.255.255.252. In binary, this address converts to the values:

11111111.11111111.11111111.11111100

This particular subnet mask assigns 30 bits to network addressing, leaving us only 2 bits for our own possible host addressing. Because IPv4 is a 32 bit system.

And we're using base-2, so we take the 2 leftover bits for hosting and raise it to the number of leftover bits we have, which in this case is 2.

So we say 22 leaves us with a resulting 4 possible host addresses.

For example, we can take our IP address 1.0.0.1, and subnet mask 255.255.255.252, and perform a bitwise AND operation:

00000001.00000000.00000000.00000001
AND
11111111.11111111.11111111.11111100
-------
00000001.00000000.00000000.00000000

We have taken the IP address and subnet mask, and compared the binary of each corresponding bit. In an AND operation, if we have 0 and 0, the result is 0. If we have 1 and 0, the result is 0. But if we have 1 and 1, the result is 1. Here, this gives us the network address of the range. It produces the binary output equivalent to 1.0.0.0, which is our base network address.

And following our rule for calculating the host addresses using the leftover bits we have, we say we have 22, or 4 IPv4 addresses, including our network address: 1.0.0.0, 1.0.0.1, 1.0.0.2, and 1.0.0.3.

The lower address, 1.0.0.0, is the network address, while the upper address, 1.0.0.3, denotes the broadcast address. The two addresses between, 1.0.0.1 and 1.0.0.2, denote the actual usable host range.

If we were calculating 1.0.0.1 with the /29 block, subnet 255.255.255.248, we would have 29 bits set to network addressing and 3 bits for our own host addressing. So we would again take 2 and raise it to the number of free bits we have: 23, which gives us 8. Hence, the 8 addresses, with our actual usable range in between the network and broadcast addresses:

1.0.0.0 (Network address)
1.0.0.1
1.0.0.2
1.0.0.3
1.0.0.4
1.0.0.5
1.0.0.6
1.0.0.7 (Broadcast address)

Some useful references:

Understand TCP/IP addressing and subnetting basics

RFC 971, the original IPv4 Specification

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...