Skip to main content

Latin1 vs UTF8

Latin1 was the early default character set for encoding documents delivered via HTTP for MIME types beginning with /text . Today, only around only 1.1% of websites on the internet use the encoding, along with some older appplications. However, it is still the most popular single-byte character encoding scheme in use today. A funny thing about Latin1 encoding is that it maps every byte from 0 to 255 to a valid character. This means that literally any sequence of bytes can be interpreted as a valid string. The main drawback is that it only supports characters from Western European languages. The same is not true for UTF8. Unlike Latin1, UTF8 supports a vastly broader range of characters from different languages and scripts. But as a consequence, not every byte sequence is valid. This fact is due to UTF8's added complexity, using multi-byte sequences for characters beyond the general ASCII range. This is also why you can't just throw any sequence of bytes at it and e...

Binary, IPv4, and Subnets

The IPv4 protocol which we broadly (but not totally) use today rests on an addressing system that was designed in the 1970s and formally published in 1980. It uses quad dotted 32 bit addresses, which can be written in various notations, and a straightforward subnetting scheme.

Binary

For example, the address 1.0.0.1, which belongs to CloudFlare, is constituted by four separate octets, 1, 0, 0, and 1. We can convert these numbers into alternative representations like binary, which is a number N base-2, N2. And octets here are simply integers ranging from 0 to 255. In binary, we represent numbers like this by powers of 2. If this doesn't make sense, I'll explain below:

0 = 0 = (2^0 * 0)
1 = 1 = (2^0 * 1)
2 = 10 = (2^1 * 1) + (2^0 * 0)
3 = 11 = (2^1 * 1) + (2^0 * 1)
4 = 100 = (2^2 * 1) + (2^1 * 0) + (2^0 * 0)
5 = 101 = (2^2 * 1) + (2^1 * 0) + (2^0 * 1)
6 = 110 = (2^2 * 1) + (2^1 * 1) + (2^0 * 0)
7 = 111 = (2^2 * 1) + (2^1 * 1) + (2^0 * 1)
8 = 1000 = (2^3 * 1) + (2^2 * 0) + (2^1 * 0) + (2^0 * 0)

Decimal to Binary

To find the binary of a number like 3, we divide by 2 and note the remainder each time. First, 3 ÷ 2 gives a remainder of 1, then 1 ÷ 2 gives another remainder of 1. Reading the remainders from bottom to top, we get "11" as the binary representation of 3.

Binary to Decimal

But how do we convert a binary representation of three back to decimal form? The number three is "11" in binary representation.

So we calculate by powers of 2 from our left most significant bit N, to the right least significant bit, which is 0, e.g. from 2^N to 2^0. And each time, we multiply this value by the binary value, 0 or 1, of each individual digit, like this. Then we take the sum of this sequence. For example, for the binary representation of three, which is "11":

(2^1 * 1) + (2^0 * 1) = 2 + 1 = 3

And for the binary representation of two, which is "10", we do this:

(2^1 * 1) + (2^0 * 0) = 2 + 0 = 2

And in software, we expand this notion and use the 8 bit standard to construct a byte representation our possible octet values from 0 to 255. And we write them like this:

0 = 00000000 
1 = 00000001
2 = 00000010
..
254 = 11111110
255 = 11111111

32-bit addressing

Our first digit in CloudFlare's DNS IPv4 address 1.0.0.1 is represented by 12. And 0, by 02. This format follows for any other numbers in an IP address. And altogether, we can rewrite the address 1.0.0.1 in binary like so:

00000001.00000000.00000000.00000001

We say this binary represents an address within the 32-bit IPv4 addressing system. That is, each IP address is composed of four 8-bit octets. And each bit can be 1 or 0. Therefore, we say there are 232 possible IPv4 addresses, or approximately 4.3 billion possible unique addresses.

IPv4 Subnetting

Subnets are a strategy for dividing a particular domain into subnetwork blocks. This involves what we call a "subnet mask."

But to get a better understanding and visualize it, each IP range, e.g. /30, etc - delineates a range of addresses for a particular routing table. And each imply a subnet mask, like this:

/0 - 0.0.0.0 
/1 - 128.0.0.0
/2 - 192.0.0.0
/3 - 224.0.0.0
/4 - 240.0.0.0
/5 - 248.0.0.0
/6 - 252.0.0.0
/7 - 254.0.0.0
/8 - 255.0.0.0 (Class A)
/9 - 255.128.0.0
/10 - 255.192.0.0
/11 - 255.224.0.0
/12 - 255.240.0.0
/13 - 255.248.0.0
/14 - 255.252.0.0
/15 - 255.254.0.0
/16 - 255.255.0.0 (Class B)
/17 - 255.255.128.0
/18 - 255.255.192.0
/19 - 255.255.224.0
/20 - 255.255.240.0
/21 - 255.255.248.0
/22 - 255.255.252.0
/23 - 255.255.254.0
/24 - 255.255.255.0 (Class C)
/25 - 255.255.255.128
/26 - 255.255.255.192
/27 - 255.255.255.224
/28 - 255.255.255.240
/29 - 255.255.255.248
/30 - 255.255.255.252
/31 - 255.255.255.254 (Used for point-to-point links)
/32 - 255.255.255.255 (Single IP address)

A subnet mask is a bitmask that denotes how many bits are used for networking and which are for hosting.

For example, the subnet mask for the /30 block is 255.255.255.252. In binary, this address converts to the values:

11111111.11111111.11111111.11111100

This particular subnet mask assigns 30 bits to network addressing, leaving us only 2 bits for our own possible host addressing. Because IPv4 is a 32 bit system.

And we're using base-2, so we take the 2 leftover bits for hosting and raise it to the number of leftover bits we have, which in this case is 2.

So we say 22 leaves us with a resulting 4 possible host addresses.

For example, we can take our IP address 1.0.0.1, and subnet mask 255.255.255.252, and perform a bitwise AND operation:

00000001.00000000.00000000.00000001
11111111.11111111.11111111.11111100
-------
00000001.00000000.00000000.00000000

We have taken the IP address and subnet mask, and compared the binary of each corresponding bit. In an AND operation, if we have 0 and 0, the result is 0. If we have 1 and 0, the result is 0. But if we have 1 and 1, the result is 1. Here, this gives us the network address of the range. It produces the binary output equivalent to 1.0.0.0, which is our base network address.

And following our rule for calculating the host addresses using the leftover bits we have, we say we have 22, or 4 IPv4 addresses, including our network address: 1.0.0.0, 1.0.0.1, 1.0.0.2, and 1.0.0.3.

The lower address, 1.0.0.0, is the network address, while the upper address, 1.0.0.3, denotes the broadcast address. The two addresses between, 1.0.0.1 and 1.0.0.2, denote the actual usable host range.

If we were calculating 1.0.0.1 with the /29 block, subnet 255.255.255.248, we would have 29 bits set to network addressing and 3 bits for our own host addressing. So we would again take 2 and raise it to the number of free bits we have: 23, which gives us 8. Hence, the 8 addresses, with our actual usable range in between the network and broadcast addresses:

1.0.0.0 (Network address)
1.0.0.1
1.0.0.2
1.0.0.3
1.0.0.4
1.0.0.5
1.0.0.6
1.0.0.7 (Broadcast address)

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...