I have encountered my fair share of in-house RC4 implementations from the 90s. Every single one of them was vulnerable in some way. They suffered from all kinds of issues: improper IV initialisation, predictable keystreams, and even partial leakage of plaintext into ciphertext. RC4's deceptively simple specification made it enticing to implement, giving developers a false sense of confidence and security.
As another example, Microsoft Outlook 2003 infamously used CRC32 to "hash" the personal folder (.PST) passwords: <https://www.nirsoft.net/articles/pst_password_bug.html>. Naturally, it was trivial to find a CRC32 checksum collision and open someone else's PST.
Thankfully, the industry has come a long way since then. These days, rolling your own cipher is, quite rightly, considered a red flag!
Nirsoft saved my ass so many times on different things. I remember when I lived somewhere without (reliable or consistent) internet access, I scraped all the tools to take with me. They still are in my tools folder to this day!
RC4 was as much of a political statement as a technical spec. At the time, governments were banning cyphers for various reasons. RC4 was simple enough that you could memorize it to get around any prohibitions.
Some languages like c++ don't have popular package managers, so adding even one dependency can be very difficult. Learning an unpopular package manager and asking your team to rely on it introduces the burden of teaching everyone how to use it, if the manager even allows it
In this hostile environment, many wheels are reinvented
C programmers actually consider this a point of pride
90's crypto was interesting. They would just use naked RSA and block ciphers. Usually, the team would have one guy who was "smart about crypto" and he was just left to do his thing and after passing functional tests, it was accepted into the product. There was so much fun stuff to break as was it fun to try to prevent people from breaking your stuff.
Even companies as well resourced as Microsoft made these mistakes well into the 2000s. Remember when they used plain old AES to encrypt the Viewstate for ASP.Net? It was vulnerable to padding oracle attacks: https://en.wikipedia.org/wiki/Padding_oracle_attack#Attacks_...
Cryptography is such an esoteric and deep field that it's easy for a fairly smart but inexperience engineer to misjudge the security of a particular implementation or usage of a cryptographic primitive.
> Even companies as well resourced as Microsoft made these mistakes well into the 2000s.
Indeed! As I just wrote in another comment on this page, Microsoft Outlook 2003 used CRC32 to "hash" the personal folder (.PST) passwords. Since CRC32 isn't a cryptographic hash, it was trivial to generate a collision and access someone else's Outlook personal folder. This flaw persisted until at least 2006! More details here: <https://www.nirsoft.net/articles/pst_password_bug.html>.
I guess the thing about these examples is that cryptography can "visibly work" while being broken. The vast majority of people looking at the product will observe it to work "fine", in that nothing blows up.
The cryptography discussed in the article is probably more aptly titled 1980s microcomputers-oriented cryptography. QText was first released in 1988. I'm not sure when they first added supported for passcode-protected files, but the version show in the article was released in 1992. This is before the spread of the early internet, before RC4 was leaked, before MD5 and HMAC were released and probably before even MD4 became widespread.
I admit I was too young to be well-versed in cryptography back then, but as far as I can tell the only well-known cryptographic algorithms that I can think of during the late 1980s were RSA and DES, maybe also ElGamal? I'm not aware of any cryptographic hash function which predates MD2. There must have been some, but I don't know of any of them really caught on.
Looking at PC software from the early 1980s up to the early 1990s, most of the software used 100% in-house roll-your-own-crypto. DES and RSA were initially too slow for microcomputers and even when processing power increased, they were not so trivial to implement yourself and there weren't widely available in libraries until the mid 1990s.
So what you eventually got in this period was mostly ad-hoc algorithms that did very rudimentary encryption and were only as good as the author's imagination. If you were particularly unlucky, they wouldn't be much better than a glorified monoalphabetic cipher. This seems to be the case in QText as well. At least the key derivation function seems to be completely in-house and as the paper has demonstrated (and as you'd fully expect from an in-house algorithm), it has weaknesses that make MD4 seem secure.
I think PGP (first released in 1991) is where we can see the trend start shifting into composing more-or-less standard algorithms using insecure in-house constructions. The first version of PGP used an in-house symmetric cipher called Bass-O-Matic (together with RSA and MD4), but PGP 2.0 replaced that cipher with IDEA[1]. It seems like in the beginning even the RSA signature format was non-standard, and PGP switched to a PKCS #1-based format only in version 2.3[2].
This where you start seeing all the famous 1990s schemes that go horribly wrong at misusing IVs or performing key derivation with a single-iteration of unsalted hash. But 80s crypto is even worse.
I know this would be less fun, but given that the key space was only 36^4, why not just run the actual decryption functionality in QText? Like, even if it takes 1 second to decrypt, spin up 32 cores and wait a day. They allude to the idea that checking the key derivation is faster, but I wonder by how much.
(of course, it’s still interesting to read about 90s encryption, so I appreciate that they did it the fun way)
I assume the hard part would have been automating this or extracting the key derivation and check code out of QText so you can run it separately in a loop.
I'm pretty sure you can automate DOSBox input, but if you're more comfortable with reversing algorithms than writing reliable UI automation script then what they did isn't necessarily an overkill.
Love this article. Brings back memories of such a simpler time spending way too much time doing exactly this with IDA pro and Bochs (my favorite tool at the time for these sorts of projects). Bochs plus custom plugins equaled some amazing capabilities for real-time dynamic analysis of DOS, bootcode, and other low-level applications.
edit2: not throttled by wix, although they are hosted with wix. There's some strange url parameters in the image src attribute, which I assume are supposed to do something fancy. That fancy bit isn't working.
I'm having the same issue, both Firefox and Vivaldi with or without a VPN the images are just low-res thumbnails. I tried to load the site in the Tor browser, but got 403'd.
I find it amusing that in 02023, after 77 years of software development, they referred to something like 01992 as "early software development", because people had only been developing software for 46 years at that time. But it's true that most software that has been written so far was written after that.
Are we still in "early software development"? Presumably most software that will ever be written hasn't been written yet.
Well, yes and no. For many people in our profession, anything related to computers is thought to have started in the late 70s when computers became things that worked outside of elaborate data centers. It is somewhat amusing that the collective memory now is that the mouse and GUI were invented by Xerox, virtualization is thought to be a thing from the late 90s, and touch screens are from the 2000s, even though all that technology was around since the 60s. We must have driven the old-timers nuts with all the widespread mainstream bragging about our “inventions”.
Yeah, I always struggle with how to describe what Xerox invented GUIwise. Sutherland's SKETCHPAD in 01963 had an interactive CAD graphical user interface with windows, icons, and a pointer—but no menus, overlapping windows, or desktop, and not much text (it slowed the display list redraw down a lot). NLS had white backgrounds, hypertext, and a mouse, but still no command menus or WYSIWYG editing or overlapping windows. You issued textual commands to make edits to the displayed text. What I'm using now to write this is recognizably "the same thing" as Smalltalk-76 in a way that Smalltalk-76 wasn't the same thing as Smalltalk-72 or NLS or Sketchpad.
So, "the desktop GUI"? But Smalltalk-80 didn't have a desktop in the sense of a place to represent your files with icons, even if Star did. WYSIWYG? Direct manipulation? But Shneiderman's #1 example of "direct manipulation" is Emacs.
But it's also recognizably "the same" as medieval manuscripts in many ways!
The thing is... it's generally safe to truncate a leading zero [0], but it's not necessarily safe to truncate a trailing zero. For example, sometimes trailing zeros convey precision, and then you've got SEMVER [1] causing situations like Drupal 7.1 and 7.10 and 7.100 (spanning 100 minor releases).
[0] ZIP codes and phone numbers are important exceptions, but it's a non-issue if you always process these as strings, never as numbers, which is a reasonable constraint because we don't need to sort these numerically. Lexicographical sort is perfectly fine.
[1] The concept mentioned in footnote 0 does not really apply to SEMVER, because we do like to sort versions numerically. Lexicographical sort is wrong. But it's a group of dot-delimited integers, not to be conflated with floats, so while 7.100 comes before 7.2 when sorting floats, 7.100 comes after 7.2 when sorting SEMVER because the 2 and 100 are just integers.
Counterpoint: Natural sorting likely will be orders of magnitude slower sorting than ordinal sorting.
For most situations, the better solution is storing the index separately to the name in another column or in metadata.
But for some stores, there isn't an easy way to do to store or sort on metadata, and so prefixing leading zeroes helps keep things stored more naturally while using the more efficient sort.
Oh of course it is. But if you have serveral 100s up to several 1000s items, you do not care. Computers are here to do the heavy lifting. If we talk about millions of items, its completly different story. Probably simple numerical ID is out of option and you start to shard them somehow. Use right tool for right task!
On a practical note, we don’t tend to prefix zeroes to numbers because they are superfluous. If programmers are using strings to store a year and those strings are limited to four digits, your project likely has a host of other issues that will become problems long before Y10K.
We already have a precedent, in programming, for prefixed zeroes having meaning: “an octal number follows.” Much like 0x indicates a hexadecimal number.
What's the point of prefixing 0 to dates written forum posts? It just confuses contemporary human readers.
Historians do a reasonable job at adequately translating dates from thousands of years ago across multiple calendar changes and societal collapses. Whatever future historian 10k+ years in the future is reading your post, should it survive, will be able to work out the date in the post, just from the language and other context clues alone.
It'll be hard to confuse 12025 with 2025 in the same way it's hard to confuse 2025 with AD 25.
I have encountered my fair share of in-house RC4 implementations from the 90s. Every single one of them was vulnerable in some way. They suffered from all kinds of issues: improper IV initialisation, predictable keystreams, and even partial leakage of plaintext into ciphertext. RC4's deceptively simple specification made it enticing to implement, giving developers a false sense of confidence and security.
As another example, Microsoft Outlook 2003 infamously used CRC32 to "hash" the personal folder (.PST) passwords: <https://www.nirsoft.net/articles/pst_password_bug.html>. Naturally, it was trivial to find a CRC32 checksum collision and open someone else's PST.
Thankfully, the industry has come a long way since then. These days, rolling your own cipher is, quite rightly, considered a red flag!
I've seen far too many IVs statically declared as "<Product>IV" in my lifetime.
Bonus marks for when the key was also "<Product>Key".
Nirsoft saved my ass so many times on different things. I remember when I lived somewhere without (reliable or consistent) internet access, I scraped all the tools to take with me. They still are in my tools folder to this day!
> RC4's deceptively simple specification made it enticing to implement, giving developers a false sense of confidence and security.
I never like the idea of hand implementing crypto, ever. Why would I not just use existing libraries?
RC4 was as much of a political statement as a technical spec. At the time, governments were banning cyphers for various reasons. RC4 was simple enough that you could memorize it to get around any prohibitions.
Some languages like c++ don't have popular package managers, so adding even one dependency can be very difficult. Learning an unpopular package manager and asking your team to rely on it introduces the burden of teaching everyone how to use it, if the manager even allows it
In this hostile environment, many wheels are reinvented
C programmers actually consider this a point of pride
90's crypto was interesting. They would just use naked RSA and block ciphers. Usually, the team would have one guy who was "smart about crypto" and he was just left to do his thing and after passing functional tests, it was accepted into the product. There was so much fun stuff to break as was it fun to try to prevent people from breaking your stuff.
Even companies as well resourced as Microsoft made these mistakes well into the 2000s. Remember when they used plain old AES to encrypt the Viewstate for ASP.Net? It was vulnerable to padding oracle attacks: https://en.wikipedia.org/wiki/Padding_oracle_attack#Attacks_...
Cryptography is such an esoteric and deep field that it's easy for a fairly smart but inexperience engineer to misjudge the security of a particular implementation or usage of a cryptographic primitive.
> Even companies as well resourced as Microsoft made these mistakes well into the 2000s.
Indeed! As I just wrote in another comment on this page, Microsoft Outlook 2003 used CRC32 to "hash" the personal folder (.PST) passwords. Since CRC32 isn't a cryptographic hash, it was trivial to generate a collision and access someone else's Outlook personal folder. This flaw persisted until at least 2006! More details here: <https://www.nirsoft.net/articles/pst_password_bug.html>.
I guess the thing about these examples is that cryptography can "visibly work" while being broken. The vast majority of people looking at the product will observe it to work "fine", in that nothing blows up.
The cryptography discussed in the article is probably more aptly titled 1980s microcomputers-oriented cryptography. QText was first released in 1988. I'm not sure when they first added supported for passcode-protected files, but the version show in the article was released in 1992. This is before the spread of the early internet, before RC4 was leaked, before MD5 and HMAC were released and probably before even MD4 became widespread.
I admit I was too young to be well-versed in cryptography back then, but as far as I can tell the only well-known cryptographic algorithms that I can think of during the late 1980s were RSA and DES, maybe also ElGamal? I'm not aware of any cryptographic hash function which predates MD2. There must have been some, but I don't know of any of them really caught on.
Looking at PC software from the early 1980s up to the early 1990s, most of the software used 100% in-house roll-your-own-crypto. DES and RSA were initially too slow for microcomputers and even when processing power increased, they were not so trivial to implement yourself and there weren't widely available in libraries until the mid 1990s.
So what you eventually got in this period was mostly ad-hoc algorithms that did very rudimentary encryption and were only as good as the author's imagination. If you were particularly unlucky, they wouldn't be much better than a glorified monoalphabetic cipher. This seems to be the case in QText as well. At least the key derivation function seems to be completely in-house and as the paper has demonstrated (and as you'd fully expect from an in-house algorithm), it has weaknesses that make MD4 seem secure.
I think PGP (first released in 1991) is where we can see the trend start shifting into composing more-or-less standard algorithms using insecure in-house constructions. The first version of PGP used an in-house symmetric cipher called Bass-O-Matic (together with RSA and MD4), but PGP 2.0 replaced that cipher with IDEA[1]. It seems like in the beginning even the RSA signature format was non-standard, and PGP switched to a PKCS #1-based format only in version 2.3[2].
This where you start seeing all the famous 1990s schemes that go horribly wrong at misusing IVs or performing key derivation with a single-iteration of unsalted hash. But 80s crypto is even worse.
[1] http://www.cypherspace.org/adam/timeline/
[2] https://www.rfc-editor.org/rfc/rfc1991.html
90's crypto exists today, NTLMv2 auth used by SMB is something like HMAC-MD5(MD4(password)).
I mean, they were implementing straight out of Applied Cryptography. How good a job could they possibly have done?
A fun thing to look at today is `deslogin`, the predecessor to SSH.
Why waste 2 seconds of my time making your website have a splash screen?
I know this would be less fun, but given that the key space was only 36^4, why not just run the actual decryption functionality in QText? Like, even if it takes 1 second to decrypt, spin up 32 cores and wait a day. They allude to the idea that checking the key derivation is faster, but I wonder by how much.
(of course, it’s still interesting to read about 90s encryption, so I appreciate that they did it the fun way)
I assume the hard part would have been automating this or extracting the key derivation and check code out of QText so you can run it separately in a loop.
I'm pretty sure you can automate DOSBox input, but if you're more comfortable with reversing algorithms than writing reliable UI automation script then what they did isn't necessarily an overkill.
Love this article. Brings back memories of such a simpler time spending way too much time doing exactly this with IDA pro and Bochs (my favorite tool at the time for these sorts of projects). Bochs plus custom plugins equaled some amazing capabilities for real-time dynamic analysis of DOS, bootcode, and other low-level applications.
And crazy OS debugging up tp i7 processors.
Who would have thought that in 2025 I'd be hyper alert to cute ASCII art splashes where one row is mysteriously misaligned?
It's the details that are the giveaway.
Very cool that ScummVM was allowed to host IDA Pro 5.0 for free. I will be playing around with that tonight :)
Interesting read, but none of the images will load for me. Throttled by wix?
edit: I'm not sure why this is being downvoted. The website looks like this to me: https://i.imgur.com/q6846RF.png
edit2: not throttled by wix, although they are hosted with wix. There's some strange url parameters in the image src attribute, which I assume are supposed to do something fancy. That fancy bit isn't working.
I'm having the same issue, both Firefox and Vivaldi with or without a VPN the images are just low-res thumbnails. I tried to load the site in the Tor browser, but got 403'd.
They show like that for me briefly and then load. Maybe the decision to load the image is done dynamically in JS as a scraping countermeasure.
They do not load for me, even after waiting for several minutes. I've tried in Chrome, Firefox, and Brave. No ad blockers, nothing unusual.
edit: They work on my Android phone.
I find it amusing that in 02023, after 77 years of software development, they referred to something like 01992 as "early software development", because people had only been developing software for 46 years at that time. But it's true that most software that has been written so far was written after that.
Are we still in "early software development"? Presumably most software that will ever be written hasn't been written yet.
Well, yes and no. For many people in our profession, anything related to computers is thought to have started in the late 70s when computers became things that worked outside of elaborate data centers. It is somewhat amusing that the collective memory now is that the mouse and GUI were invented by Xerox, virtualization is thought to be a thing from the late 90s, and touch screens are from the 2000s, even though all that technology was around since the 60s. We must have driven the old-timers nuts with all the widespread mainstream bragging about our “inventions”.
Yeah, I always struggle with how to describe what Xerox invented GUIwise. Sutherland's SKETCHPAD in 01963 had an interactive CAD graphical user interface with windows, icons, and a pointer—but no menus, overlapping windows, or desktop, and not much text (it slowed the display list redraw down a lot). NLS had white backgrounds, hypertext, and a mouse, but still no command menus or WYSIWYG editing or overlapping windows. You issued textual commands to make edits to the displayed text. What I'm using now to write this is recognizably "the same thing" as Smalltalk-76 in a way that Smalltalk-76 wasn't the same thing as Smalltalk-72 or NLS or Sketchpad.
So, "the desktop GUI"? But Smalltalk-80 didn't have a desktop in the sense of a place to represent your files with icons, even if Star did. WYSIWYG? Direct manipulation? But Shneiderman's #1 example of "direct manipulation" is Emacs.
But it's also recognizably "the same" as medieval manuscripts in many ways!
Your octal years aren’t coming out right
Yeah.. and its annoying to read.. I always laught at people who try to make IDs with leading zeros. and then one day BOOM!! overflow.
Its especially common in namings like: THING-01 THING-02.. we will never have more than 100 of them.. and then BOOM.
I always say: leave it at fucking 1 and count up. Thats why we invented Natural Sorting to sort this out...
The thing is... it's generally safe to truncate a leading zero [0], but it's not necessarily safe to truncate a trailing zero. For example, sometimes trailing zeros convey precision, and then you've got SEMVER [1] causing situations like Drupal 7.1 and 7.10 and 7.100 (spanning 100 minor releases).
[0] ZIP codes and phone numbers are important exceptions, but it's a non-issue if you always process these as strings, never as numbers, which is a reasonable constraint because we don't need to sort these numerically. Lexicographical sort is perfectly fine.
[1] The concept mentioned in footnote 0 does not really apply to SEMVER, because we do like to sort versions numerically. Lexicographical sort is wrong. But it's a group of dot-delimited integers, not to be conflated with floats, so while 7.100 comes before 7.2 when sorting floats, 7.100 comes after 7.2 when sorting SEMVER because the 2 and 100 are just integers.
Counterpoint: Natural sorting likely will be orders of magnitude slower sorting than ordinal sorting.
For most situations, the better solution is storing the index separately to the name in another column or in metadata.
But for some stores, there isn't an easy way to do to store or sort on metadata, and so prefixing leading zeroes helps keep things stored more naturally while using the more efficient sort.
Oh of course it is. But if you have serveral 100s up to several 1000s items, you do not care. Computers are here to do the heavy lifting. If we talk about millions of items, its completly different story. Probably simple numerical ID is out of option and you start to shard them somehow. Use right tool for right task!
ah.. so use THING-001. got it
See https://longnow.org/ideas/long-now-years-five-digit-dates-an....
So a little silly, a little serious.
On a practical note, we don’t tend to prefix zeroes to numbers because they are superfluous. If programmers are using strings to store a year and those strings are limited to four digits, your project likely has a host of other issues that will become problems long before Y10K.
We already have a precedent, in programming, for prefixed zeroes having meaning: “an octal number follows.” Much like 0x indicates a hexadecimal number.
I hope this is satire?
Just in case it's serious or semi-serious:
It's utterly ridiculous to worry about 10k date problems given we first have these:
2038 problem ( Signed unix time overflow )
2069 problem ( strptime() parsing )
2079 problem ( unsigned days since 1 January 1900 )
2100 problem ( FAT/DOS )
2106 problem ( Unsigned unix time overflow )
Further out but still way before 10k:
2262 ( signed nanoseconds since 1 January 1970 )
And that's just the bigger ones. ( See: https://en.wikipedia.org/wiki/Time_formatting_and_storage_bu... )
What's the point of prefixing 0 to dates written forum posts? It just confuses contemporary human readers.
Historians do a reasonable job at adequately translating dates from thousands of years ago across multiple calendar changes and societal collapses. Whatever future historian 10k+ years in the future is reading your post, should it survive, will be able to work out the date in the post, just from the language and other context clues alone.
It'll be hard to confuse 12025 with 2025 in the same way it's hard to confuse 2025 with AD 25.
a leading zero "implies" octal at least since K&R C, which predates that page by 000000043 years. You guys need a different prefix for that.
It's when the idea of selling software became accessible (in terms of education / affordability) to most people.
That hasn't happened yet, but it's close.