Skip to main content

Latin1 vs UTF8

Latin1 was the early default character set for encoding documents delivered via HTTP for MIME types beginning with /text . Today, only around only 1.1% of websites on the internet use the encoding, along with some older applications. However, it is still the most popular single-byte character encoding scheme in use today. A funny thing about Latin1 encoding is that it maps every byte from 0 to 255 to a valid character. This means that literally any sequence of bytes can be interpreted as a valid string. The main drawback is that it only supports characters from Western European languages. The same is not true for UTF8. Unlike Latin1, UTF8 supports a vastly broader range of characters from different languages and scripts. But as a consequence, not every byte sequence is valid. This fact is due to UTF8's added complexity, using multi-byte sequences for characters beyond the general ASCII range. This is also why you can't just throw any sequence of bytes at it and ex...

Subshells in Linux (and Windows)

Or rather, subshells in Bash and Powershell. A subshell functions as a sort of isolated environment for executing commands, creating a subprocess or child process within the parent shell. It lets a user define specific environment variables on a per-process basis, enabling the creation of child processes with distinct characteristics.

In Bash

Imagine you have a Bash script that could alter certain exports, but you don't want these changes to affect the global system values. Enter subshells. Here's a simple example. Subshells in Bash are broken into and out of using parentheses:

#!/bin/bash

echo "PATH before subshell: $PATH"
echo " "
(
  subshell_path="/Users/hexagr/subshell"
  export PATH="$subshell_path"
  echo "PATH within subshell: $PATH"
  
  echo " "
  # Execute the command using the full path
  subshell_cmd="do_stuff.sh"
  $subshell_cmd
  echo " "
)

# Print the PATH after the subshell
echo "PATH after subshell: $PATH"
echo " "
# Try the subshell command script in the regular shell,
# but it fails because we don't have the path!
echo "Executing do_stuff.sh after subshell:"
$subshell_cmd
do_stuff.sh 

The bash script's output, in combination with our other shell script do_stuff.sh which just echo's a simple message. Access to the custom path only happens in the subshell:

$ ./test.sh
PATH before subshell: /usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin
 
PATH within subshell: /Users/hexagr/subshell
 
This path only affects the current subshell.
 
PATH after subshell: /usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin
 
Executing do_stuff.sh after subshell:
./test.sh: line 22: do_stuff.sh: command not found

A Windows Analog in Powershell

While Microsoft Windows doesn't officially specify this as a "subshell" as far as I can tell, the following strategy provides functionally similar behavior for Windows operating systems.

We can use the .NET API to manipulate the environment variables on a per-process basis in Powershell. But before we do so, let's print our regular system shell %PATH% like so, with cmd.exe /c echo %PATH%:

Windows regular system %PATH%

Now let's use the System.Diagnostics capabilities provided by .NET's ProcessStartInfo class to create a New-Object called $x.

We'll set the filename to cmd.exe along with our argument. Then we'll remove the original system Path and replace it with our own custom path. We shall also disable UseShellExecute so our new object doesn't use the shell's default variables and will instead start the process directly from our process.

Finally, we'll assign $p to a System.Diagnostics.Process object. And then set the StartInfo for our new $p object to our $x ProcessStartInfo object. Then launch it with $p.Start().

Thank you Microsoft Documentation and StackOverflow ;)

$x = New-Object System.Diagnostics.ProcessStartInfo
$x.FileName = "cmd.exe"
$x.Arguments = "/c echo %PATH%"
$x.EnvironmentVariables.Remove("Path")
$x.EnvironmentVariables.Add("PATH", "C:\custom\path")
$x.UseShellExecute = $false
$p = New-Object System.Diagnostics.Process
$p.StartInfo = $x
$p.Start()

We can see our new subprocess is effectively confined to the C:\custom\path now since we created a new subshell with custom environment variables, removed its regular system Path, and set it's %PATH% to be our custom directory. And after our cmd.exe subprocess runs and we're back in the regular shell, we can print the default system path to confirm we didn't affect any of the global environment variables in the main shell.

Windows console screenshot

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...