Skip to main content

Latin1 vs UTF8

Latin1 was the early default character set for encoding documents delivered via HTTP for MIME types beginning with /text . Today, only around only 1.1% of websites on the internet use the encoding, along with some older applications. However, it is still the most popular single-byte character encoding scheme in use today. A funny thing about Latin1 encoding is that it maps every byte from 0 to 255 to a valid character. This means that literally any sequence of bytes can be interpreted as a valid string. The main drawback is that it only supports characters from Western European languages. The same is not true for UTF8. Unlike Latin1, UTF8 supports a vastly broader range of characters from different languages and scripts. But as a consequence, not every byte sequence is valid. This fact is due to UTF8's added complexity, using multi-byte sequences for characters beyond the general ASCII range. This is also why you can't just throw any sequence of bytes at it and ex...

A Parlor Trick with Primitive Roots

OK, the title is a bit of a pun -- really, the parlor trick uses elementary number theory. A primitive root modulo n however, is an integer g such that every integer relatively prime to n can be expressed as some power of g modulo n. In other words, g can generate all numbers relatively prime to n through its powers.

When dealing with modular arithmetic, cyclic groups, and primitive roots, we find patterns emerge. For example, we can see the powers of 3 are congruent to a cyclic pattern that repeats with numbers modulo 7, the powers of 3 give: 3, 2, 6, 4, 5, 1 — and then it loops back to 3.

\begin{array}{rcrcrcrcrcr}3^{1}&=&3^{0}\times 3&\equiv &1\times 3&=&3&\equiv &3{\pmod {7}}\\3^{2}&=&3^{1}\times 3&\equiv &3\times 3&=&9&\equiv &2{\pmod {7}}\\3^{3}&=&3^{2}\times 3&\equiv &2\times 3&=&6&\equiv &6{\pmod {7}}\\3^{4}&=&3^{3}\times 3&\equiv &6\times 3&=&18&\equiv &4{\pmod {7}}\\3^{5}&=&3^{4}\times 3&\equiv &4\times 3&=&12&\equiv &5{\pmod {7}}\\3^{6}&=&3^{5}\times 3&\equiv &5\times 3&=&15&\equiv &1{\pmod {7}}\\3^{7}&=&3^{6}\times 3&\equiv &1\times 3&=&3&\equiv &3{\pmod {7}}\\\end{array}

This kind of repetition shows up even in something as simple as the last digit of powers. A neat trick is using this modular property to deduce the last digit of a large integer.

For example, consider the integer 7. As we increment the powers of 7, the last digit begins to repeat: 7, 9, 3, 1, and so on.

Those four numbers repeat over and over again. So, we say that it has a cycle length of four:

\begin{align*} 7^1 &\equiv 7 \\ 7^2 &\equiv 49 \\ 7^3 &\equiv 343 \\ 7^4 &\equiv 2401 \\ 7^5 &\equiv 16807 \\ 7^6 &\equiv 117649 \\ 7^7 &\equiv 823543 \\ 7^8 &\equiv 5764801 \\ 7^9 &\equiv 40353607 \\ \end{align*}

Why do powers repeat like this? The answer is modular arithmetic, again. Similar to how primitive roots generate cycles, our last digit is a product of mod 10 -- and there are only ten possible numbers it can be, so it eventually cycles, too.

So, how can we use this knowledge to do a fun parlor trick, like guess the last digit of an extremely large integer, such as \(7^{3001}\)?

As demonstrated, the powers of 7 have a cycle length of four, so the last digit repeats every 4 numbers. It follows that we first divide our exponent, 3001, by 4.

\[ 3001 \div 4 = 750 \text{, with a remainder of } 1 \]

Since we are left with a remainder, we use it, calculating \(7^{1} = 7\). Thus, we know the last digit of the number \(7^{3001}\) is 7.

But what if the division has no remainder? For example, let's consider \(7^{3000}\). Dividing 3000 by 4 gives:

\[ 3000 \div 4 = 750 \text{, with a remainder of } 0 \]

Since there's no remainder, the exponent is a perfect multiple of the cycle length four, so we look at the last digit of \(7^4\), which equals 2401. The last digit of \(7^{3000}\) is 1.

Patterns like these show up so often in number theory that once you see them, you can’t unsee them.

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...