Skip to main content

Latin1 vs UTF8

Latin1 was the early default character set for encoding documents delivered via HTTP for MIME types beginning with /text . Today, only around only 1.1% of websites on the internet use the encoding, along with some older appplications. However, it is still the most popular single-byte character encoding scheme in use today. A funny thing about Latin1 encoding is that it maps every byte from 0 to 255 to a valid character. This means that literally any sequence of bytes can be interpreted as a valid string. The main drawback is that it only supports characters from Western European languages. The same is not true for UTF8. Unlike Latin1, UTF8 supports a vastly broader range of characters from different languages and scripts. But as a consequence, not every byte sequence is valid. This fact is due to UTF8's added complexity, using multi-byte sequences for characters beyond the general ASCII range. This is also why you can't just throw any sequence of bytes at it and e...

DMARC

Lately I've overheard some people discussing email spoofing with regard to organizations that don't implement DMARC. Namely, "APTs" taking advantage of organizations that don't utilize Domain-based Message Authentication, Reporting and Conformance.

Outside, and sometimes even inside, it seems there's not a lot one can do to change the security posture of a company. Incentives are sometimes complex. Or sometimes there's a sort of creeping security nihilism.

But DMARC works like this. There's a sender policy framework (SPF) that designates who can send email - that is to say, SPF authorizes a group of senders. DomainKeys Identified Mail, or DKIM, allows receivers to check to see if these messages are forged by allowing messages to be signed. A domain will host a snippet akin to an A or MX record, like this:

"k=rsa; t=s; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDDmzRmJRQxLEuyYiyMg4suA2Sy
MwR5MGHpP9diNT1hRiwUd/mZp1ro7kIDTKS8ttkI6z6eTRW9e9dDOxzSxNuXmume60Cjbu08gOyhPG3
GfWdg7QkdN6kR4V75MFlw624VY35DaXBvnlTJTgRg/EW72O1DiYVThkyCgpSYS8nmEQIDAQAB"

This is used this in conjuction with information that's relayed along with email - but which is independent of SMTP - and arrives in an email header, like this:

DKIM-Signature: v=1; a=rsa-sha256; d=example.net; s=brisbane;
     c=relaxed/simple; q=dns/txt; i=foo@eng.example.net;
     t=1117574938; x=1118006938; l=200;
     h=from:to:subject:date:keywords:keywords;
     z=From:foo@eng.example.net|To:joe@example.com|
       Subject:demo=20run|Date:July=205,=202005=203:44:08=20PM=20-0700;
     bh=MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTI=;
     b=dzdVyOfAKCdLXdJOc9G2q8LoXSlEniSbav+yuU4zGeeruD00lszZ
              VoG4ZHRNiYzR

In short, whenever an email arrives and a recipient wants to authenticate it, they can do so by verifying the hashes in the header via the DKIM keys. But how do we know if an organization implements DMARC? We can use nslookup to check if an organization uses DMARC, and if they do, view what kind of policy they implement. Per Wikipedia, DMARC provides a few different modes.

- none is the entry level policy. No special treatment is required by receivers, but enables a domain to receive feedback reports.

- quarantine asks receivers to treat messages that fail DMARC check with suspicion; different receivers have different means to implement that, for example flag messages or deliver them in the spam folder.

- reject asks receivers to outright reject messages that fail DMARC check.

With nslookup, we can check what kind of policies a domain might use, if any. For example, here's an example check of both Google and Sony's DMARC policies.

$ nslookup -type=txt _dmarc.google.com
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
_dmarc.google.com	text = "v=DMARC1; p=reject; rua=mailto:mailauth-reports@google.com"

Authoritative answers can be found from:

$ nslookup -type=txt _dmarc.sony.com
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
_dmarc.sony.com	text = "v=DMARC1; p=none; rua=mailto:dmarc_agg@vali.email,mailto:dmarc_rua@emaildefense.proofpoint.com; ruf=mailto:dmarc_ruf@emaildefense.proofpoint.com;fo=1"

Authoritative answers can be found from:

Comments

Popular posts from this blog

yt-dlp Archiving, Improved

One annoying thing about YouTube is that, by default, some videos are now served in .webm format or use VP9 encoding. However, I prefer storing media in more widely supported codecs and formats, like .mp4, which has broader support and runs on more devices than .webm files. And sometimes I prefer AVC1 MP4 encoding because it just works out of the box on OSX with QuickTime, as QuickTime doesn't natively support VP9/VPO9. AVC1-encoded MP4s are still the most portable video format. AVC1 ... is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video industry developers as of September 2019. [ 1 ] yt-dlp , the command-line audio/video downloader for YouTube videos, is a great project. But between YouTube supporting various codecs and compatibility issues with various video players, this can make getting what you want out of yt-dlp a bit more challenging: $ yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best...