posts update

2023-12-13 16:14:50 +00:00
parent 9718d84565
commit fabfd9705c
8 changed files with 218 additions and 5 deletions
--- a/posts/_metadata.json
+++ b/posts/_metadata.json
@@ -7,6 +7,30 @@
    "author": "jetstream0/Prussia",
    "tags": ["meta", "code", "project", "web", "markdown", "typescript_javascript", "css"]
  },
+  "hash-functions": {
+    "title": "Hash Functions",
+    "slug": "hash-functions",
+    "filename": "hash_functions",
+    "date": "13/12/2023",
+    "author": "jetstream0/Prussia",
+    "tags": ["cryptography"]
+  },
+  "rushed-captcha": {
+    "title": "Rushed Captcha Rewrite",
+    "slug": "rushed-captcha-rewrite",
+    "filename": "rushed_captcha_rewrite",
+    "date": "13/12/2023",
+    "author": "jetstream0/Prussia",
+    "tags": ["project", "hosting", "typescript_javascript", "ruby", "faucets"]
+  },
+  "dbless-captcha": {
+    "title": "DBless Captcha",
+    "slug": "dbless-captcha",
+    "filename": "dbless_captcha",
+    "date": "13/11/2023",
+    "author": "jetstream0/Prussia",
+    "tags": ["project", "ruby", "docs"]
+  },
  "downloading-my-spotify-playlist-for-free": {
    "title": "Downloading my Spotify Playlist for Free",
    "slug": "downloading-my-spotify-playlist-for-free",
@@ -85,7 +109,7 @@
    "filename": "190k_faucet",
    "date": "12/02/2023",
    "author": "jetstream0/Prussia",
-    "tags": ["project", "web", "milestone", "cryptocurrency"]
+    "tags": ["project", "web", "milestone", "cryptocurrency", "faucets"]
  },
  "adding-commas": {
    "title": "Adding Commas to Numbers",
--- a/posts/dbless_captcha.md
+++ b/posts/dbless_captcha.md
@@ -0,0 +1,72 @@
+Google's Recaptcha costs money for large customers, and is biased towards those who are using Chrome or are logged in with a Google account. Hcaptcha can be easily bypassed with an accessibility cookie.
+
+Seeing these problems, and since it seemed interesting, I decided to make my own text captcha.
+
+It wouldn't a (good) replacement, since text captchas can be cracked trivially with a ML model. I'm sure most Recaptchas and hCaptchas can also be broken with machine learning, but those can be bypassed a lot easier by paying fractions of a cent per solve to a captcha solving service like 2captcha (which iirc just turns around and pays real people to do them). Still, just needing to run a model or pay someone discourages would-be botters, as they may decide it's not worth the effort.
+
+Back on topic, while my text captcha wouldn't be a replacement for stuff like Recaptcha, there are other uses. For example, on platforms like Discord, if someone wanted to have a bot with a captcha (usually to prevent bots spamming members with scammy DMs), the only way they could use Recaptcha or hCaptcha would be by directing users to their own website, where they have the captcha. That's pretty annoying for the user and adds complexity to the bot. Ideally, users should be able to complete the captcha without leaving Discord.
+
+In that case, a text captcha with a good API would mean that all the bot has to do is send a message with the image of the captcha (image URL provided by the API, of course), then wait for the user to respond, and check if it is correct. Having an API would mean the captcha wouldn't be restricted to just Discord - it could be used on any service with a backend.
+
+Additionally, I didn't feel like using a database, as it would mean more setup, so I tried using cryptography to make a database unnecessary.
+
+I don't think this was a terribly interesting project, code or concept wise, but I just felt like writing this. Oh, learning about Salsa20 was cool though.
+
+## Implementation
+
+The server has a 32 byte secret key. When it is asked for a captcha, it generates a 8 byte nonce, and a 6 character code (the alphanumeric thing the user needs to be into the captcha). The code is encrypted with the secret key and nonce by Salsa20.
+
+> Nonces in cryptographically, are random numbers that are one-time use to improve security. If the same nonce is used more than once though, security may be compromised.
+
+Then, once the server gets a request to generate the captcha image (the request provides it with the nonce and encrypted code), it decrypts the encrypted code with the given nonce and the secret key, then generates the image of that decrypted text.
+
+Finally, once the server gets a request to verify that the user's answer to the captcha is correct. It is given the user's guess, encrypted code, and nonce. Again, it decrypts the encrypted code with nonce and secret key. If the decrypted code matches the user's guess, it is correct. If not, the user is wrong.
+
+Since the client does not know the server's secret key, they cannot know the code, unless they read the captcha image. All that cryptography also allows the server to create and verify the captcha without storing information anywhere (no database needed!).
+
+There is also a timestmap that is appended to the code before it is encrypted, that allows captchas to expire (eg, after 5 minutes, you need to ask for a new captcha if you haven't successfully solved this one).
+
+## API
+
+### GET `/captcha`
+Returns `image`, `code`, and `nonce`. Here, the `code` is the encrypted 6 character alpha-numeric code that the user is supposed to read the captcha and solve for. I don't know why I called both the encrypted and unencrypted codes "code". Sorry if it's confusing. Here's an example response:
+
+```json
+{
+  "image": "20ee1a711f77e7aba151eb66584ed8e374.png",
+  "code": "20ee1a711f77e7aba151eb66584ed8e374",
+  "nonce": "40ae72c55dda39fb"
+}
+```
+
+### GET `/encrypted/<encrypted>.png?nonce=<nonce>`
+Decrypt encrypted with the nonce and the secret key, extracting the code from the decrypted text. Create a 210x70 png with the code text. Draw dots and lines and blurs and whatnot to make it a little bit harder to automate.
+
+![the generated image](/images/captcha.png)
+
+### POST `/captcha`
+This endpoint is to verify whether a user successfully solved a captcha. The encrypted `code`, `nonce`, and user's `guess` must be sent in a payload. The ruby captcha wants the payload to be sent as form data, while the Node.js captcha wants it as JSON.
+
+The response will be JSON with either `"success": true` or `"success": false`. Success means the user successfully solved the captcha. In the Node.js version, there may be an `"error"` (in addition to the `"success": false`) key if the request sent is invalid.
+
+Example response:
+
+```json
+{
+  "success": true,
+}
+```
+
+> I thought I could streamline the process and remove the need for the POST request to `/captcha` by returning the [hashed and salted](/posts/hashing) answer, so the service that uses the captcha could validate the result for itself. But then I remembered why I went for symmetric encryption instead of hashing in the first place: the captcha service needed to know the image's text content to generate the image, and hashing would obviously make that impossible.
+> It could still be done if I stored a map of image URL to the answer in the database, but the entire point of this project was to **not** use a database. Also, if I was storing stuff into the database, there was no point in hashing, as I could associate any key with the answer. Plus, hashing it without a salt would be terribly insecure, since anyone could easily pre-compute the hashes of all possible 6 character alpha-numeric strings and match them to instantly solve the captcha.
+
+## How To Use The API
+
+Here's how you would probably do it:
+
+1. Send GET request to `/captcha`. Send the image and nonce to the user. 
+2. Get the user to submit an answer, along with the code (the image url without the ".png") and nonce.
+3. Make POST request to `/captcha` with the code, nonce, and user's guess.
+4. Let them through if the request gives a successful response. Do not if it doesn't.
+
+Try the captcha out at [captcha.prussia.dev](https://captcha.prussia.dev), and see the code on [Github](https://github.com/jetstream0/Captcha).
--- a/posts/fermats_little_theorem.md
+++ b/posts/fermats_little_theorem.md
@@ -9,14 +9,14 @@ Anyways, using Fermat's Little Theorem, I wanted to create a little function tha

 Then, for each check, we can generate a random `x`, calculate `x\^N - x`. If we call that, say, `m`, then we can do `m % N`, where `%` means modulo. If `m % N` is zero, that means `m` is a multiple of `N`, so we continue. If not, then we know the input is **not** a prime number, and can end it there. If we generate `x` many times, and `m % N` is always `0`, we can conclude with high probability that the input is a prime.

-At first, I mistakenly did `N % m`, but that means `m` is a factor (not a *multiple*) of `N`, and `N` would be by definition, not a prime number (only factors of a prime number are itself and 1). I realised the problem pretty quickly.
+At first, I mistakenly did `N % m === 0`, but that means `m` is a factor (not a *multiple*) of `N`, and `N` would be by definition, not a prime number (assuming `m` is not or `N`, as the only factors of a prime number are itself and 1). I realised the problem pretty quickly.

 After fixing that, this is the code I had:

 ```js
 function is_prime(potential_prime, iterations=50) {
  for (let i=0; i < iterations; i++) {
-    let x = BigInt(Math.floor(Math.random()*10000)); //0 =< x =< 9999
+    let x = BigInt(Math.floor(Math.random()*10000)); //0 <= x <= 9999
    let m = x**potential_prime - x;
    if (m%potential_prime !== BigInt(0)) {
      return false;
--- a/posts/hash_functions.md
+++ b/posts/hash_functions.md
@@ -0,0 +1,48 @@
+This aims to give a working understanding of what hash functions are and what their uses are. I won't go into the actual math, since I'm unqualified to talk about that, and I probably wouldn't understand it anyways. We're also not going to talk about hash tables since my understanding of them is very limited.
+
+A hash function is basically an algorithmn that takes data as an input, and outputs something with these important properties:
+
+- **Irreversible**: Given the output of a hash function, it should not be possible to (algorithmically) find the input
+- **Uniformity**: Inputs should have evenly distributed outputs - meaning, if you split the possible ranges of outputs into *n* buckets, and randomly hashed stuff a gazillion times, all the buckets should have a roughly equal amount of outputs in them. This also means that two similar but not identical inputs will likely have completely different outputs
+- **Deterministic**: The same two inputs will always give the same output (well, that's more of a characteristic of functions in general)
+- **Unique**: No two inputs should give the same output
+- **Fixed length**: An input of arbitary length will be converted into an output of fixed length
+
+Now wait a second - **the last two properties seem to contradict each other**. How can no two inputs give the same output if an infinite number of inputs become a finite number of inputs? Well, hash functions are not immune to the rules of logic, so it isn't actually true that no two inputs give the same output. However, *a good hash function should make it practically impossible for anyone to find two inputs that result in the same output* (called a collision). If a collision can be found, the hash function would be unsafe to use for most usecases.
+
+# Usecases?
+
+Hashing has a huuuuggggeeeee amount of applications, but I'll describe a few.
+
+## Checksums
+
+What if your best friend, the President of the United States, wants to send you an important large file (say, a video of Colonel Sanders making fried chicken), but can't physically give it to you via a USB stick or hard drive? Your friend might upload the file to a file sharing service, and share the link with you. But how do you know the file hasn't been tampered with? The file sharing service might be run by someone who has an interest in preventing you from eating some finger licking good chicken, who'll replace the real video with a fake video with Colonel Sanders putting toxic ingredients into the chicken and undercooking it. To verify the video is genuine, the President could hash the video, and tell you the hash in person (to avoid anyone also tricking you about what the hash is). Once you download the video, you can hash it too. If the hashes match (and the hash function is secure), you can be confident that the video hasn't been messed with. If they don't, you know the video has been modified.
+
+This is possible since hash functions shouldn't have two inputs that result in the same output.
+
+## Passwords
+
+Instead of storing passwords in plaintext, which would result in disaster if the database was hacked, passwords are typically hashed. Since hash functions are supposed to be irreversible, the password remains a secret, but can still be verified - the site can hash the user provided password, and make sure it matches the hash it has on file (since the same input will always get the same output).
+
+### Rainbow Tables and Salting
+
+However, if an attacker gets their hands on a database full of hashed passwords, they can still easily crack many of the passwords with something called a rainbow table. Essentially, attackers can precompute the hashes of millions of likely or known passwords before they even attack. Since two inputs will always have the same output, an attacker with a bunch of stolen hashes can just look for a matching hash in their rainbow table, and figure out what the plaintext password is, even though hash functions are irreversible.
+
+To prevent this, it is highly recommended to append random text to the password (each user should have a unique random text added) before hashing. This is called a salt. As long as that salt is stored, the password verifying process is the same - just add the stored salt before hashing. If this is done, rainbow tables are useless for attackers, since even if the user uses a common password, the random salt makes it so the hash will not be in the precomputed rainbow table. The attacker will have to generate a new rainbow table for every single user/salt, instead of just one for everyone! Salts are usually just stored wherever the hashed passwords are, but if they are kept hidden somewhere else, they are called "peppers".
+
+SALT PASSWORDS!!!
+
+## Digital Signatures
+
+In digital signatures, the hash of the message is usually signed, instead of the actual message, since the hash is guaranteed to be a certain size, which is usually smaller than the actual message, making it much easier to sign.
+
+## Proof-of-Work
+
+Hashes can be a way to impose a cost in energy. Most famously, Bitcoin and many other cryptocurrencies use PoW to reach consensus trustlessly (as long as more than 50% of the computing power isn't controlled by one entity), and some captchas also use PoW as an anti-spam. Taking Bitcoin as an example, it requires adding random bytes to the block data, then hashing it, until it the hash starts with *n* number of zeroes, in order for the block to be mined (valid). Basically, it's just making millions of guesses about what random bytes appended to the block data will result in a hash that starts with the correct number of zeroes. Using more powerful computers and more energy results in more guesses in less time, making it more likely to find the right random bytes to add to mine the block. The more zeroes are the hash is required to start with, the more difficult the problem is, and the more energy it will take to generate the work.
+
+## Key Derivation
+
+Hashing is also a great way to derive cryptographic keys. For example, a password would usually not be able to be an encryption key, since encryption keys are typically a fixed length of bytes long. So, a password can be hashed, turning it into the right length, so it can be used in cryptography.
+
+
+Hashes are cool.
--- a/posts/rushed_captcha_rewrite.md
+++ b/posts/rushed_captcha_rewrite.md
@@ -0,0 +1,21 @@
+Today, I noticed around a hundred failed faucet claims for a client's Discord bot. Wonderful.
+
+I checked the logs. Ah, the errors told me that the [captcha service](https://github.com/jetstream0/Captcha) was down. It looked like Replit, where I hosted the captcha service, had decided to remove all my environmental variables. Annoying.
+
+The fix seemed fairly simple. But Replit was planning to discontinue free non-static hosting after the end of the year, and dealing with Replit's interface has been a generally unpleasant experience for me, so I decided to take the opportunity and migrate hosts. It had to done be sooner or later after all.
+
+We wanted to keep hosting costs at $0, so I decided to move the service to render.com. Render's free tier allows for one free "Web Service", which is basically just any site with a backend that can be run with the resoruces given (512 mb, 0.1 cpu). It does require entering payment information (credit card) in though, which my client didn't particularly want to do.
+
+Luckily, I made my client's Render account before this requirement, so it had a grandfathered-in web service.
+
+Unfortunately, it was already running the project's website. The website used to have the faucet, but since the faucet was moved to Discord (too much abuse otherwise), the site didn't *really* need a backend anymore. It did two database queries to display information like remaining claims for the month, but that was all. It was fairly simple to remove the site's backend, then have the Discord bot host a very simple API that the site's frontend would call to find that information.
+
+> Since I was hosting the Discord bot on Fly.io, which suspends free-tier VMs if there is no traffic to them, the bot already had a webserver that was being pinged to keep it from stopping. So tacking on the very very limited API took <2 minutes.
+
+Alright, since the site is now static, it can be hosted on Github pages, and the captcha can be hosted on our newly available web service. As expected, it isn't that simple. I can change what repo the web service pulls from, which is good, but it's stuck in the [Node.js runtime](https://render.com/docs/native-runtimes), which I can't seem to change. This is a problem because I wrote the captcha service in Ruby, a language that is most definitely not Node.js (to prove it, note that "Ruby" and "Node.js" share no letters *and* are different lengths - therefore they must not be the same).
+
+I was fairly certain that if I deleted the web service, I would need to enter in payment info if I tried to create a new one. So that was not an option. I had to [rewrite the captcha service in Node.js](https://github.com/jetstream0/Captcha-node). It was mostly straightforward, but unluckily, I used a poorly documented cryptography library. There were a few limited examples, but I mostly had to look at the code and tests to figure out how it worked...
+
+And when I switched out salsa20 for xsalsa20 (larger nonce size means that csprngs can be securely used without fear of nonces being repeated), the output turned out to be too large to fit in the [`custom_id` of Discord buttons](https://discord.com/developers/docs/interactions/message-components#button-object-button-structure). I was forced to switch back to salsa20, and just decided to rotate keys every few hours or so (I did not have the energy to make the nonce an incrementing count, which would require storing the count in the database). Kinda stupid, but it's (probably) fine.
+
+Anyways, once I rewrote it in Node.js, I switched the web service to run that instead of the website, and everything started working again. Yay.
--- a/posts/ryuji_docs.md
+++ b/posts/ryuji_docs.md
@@ -39,7 +39,7 @@ Ryuji syntax is typically in the format `[[ something ]]` or `[[ some:thing ]]`
 ```html
 <ul>
  [[ for:trees:tree:index ]]
-  <!--index starts at zero, but you get the point-->
+    <!--index starts at zero, but you get the point-->
    <li>[[ index ]]. There is a [[ tree.type ]] tree that is [[ tree.height ]] metres tall.</li>
  [[ endfor ]]
 </ul>
--- a/posts/wikipedia_rabbitholes.md
+++ b/posts/wikipedia_rabbitholes.md
@@ -80,7 +80,6 @@ Here's a very incomplete (and maybe actively updated) list of ones that led to m
 - [Howard Zinn](https://en.wikipedia.org/wiki/Howard_Zinn), controversial historian
 - [Gnosticism](https://en.wikipedia.org/wiki/Gnosticism) was a set of early Christian beliefs
 - [TempleOS](https://en.wikipedia.org/wiki/TempleOS)
- [Grendel's mother](https://en.wikipedia.org/wiki/Grendel's_mother), an antagonist in the Old English epic poem, Beowulf, who scholars debate whether is a monster or just a female warrior
 - [Epic of Gilgamesh](https://en.wikipedia.org/wiki/Epic_of_Gilgamesh), Sumerian epic poem
 - [An Lushan Rebellion](https://en.wikipedia.org/wiki/An_Lushan_Rebellion), An Lushan rebels, greatly weakened the Tang Dynasty
 - [Bahmani Sultanate](https://en.wikipedia.org/wiki/Bahmani_Sultanate), South Indian empire
@@ -157,3 +156,52 @@ Here's a very incomplete (and maybe actively updated) list of ones that led to m
 - [Qt (software)](https://en.wikipedia.org/wiki/Qt_%28software%29), for developing GUIs
 - [Sailfish OS](https://en.wikipedia.org/wiki/Sailfish_OS)
 - [SerenityOS](https://en.wikipedia.org/wiki/SerenityOS)
+- [Puputan](https://en.wikipedia.org/wiki/Puputan), Balinese mass ritual suicide instead of surrender
+- [Harrying of the North](https://en.wikipedia.org/wiki/Harrying_of_the_North), the Normans put down English rebellions
+- [Austerity](https://en.wikipedia.org/wiki/Austerity)
+- [Muqtada al-Sadr](https://en.wikipedia.org/wiki/Muqtada_al-Sadr), Iraqi militia leader
+- [Messier objects](https://en.wikipedia.org/wiki/Messier_object), 110 non-comet space objects that the astronomer Messier recorded
+- [De jure](https://en.wikipedia.org/wiki/De_jure)
+- [Iona Nikitchenko](https://en.wikipedia.org/wiki/Iona_Nikitchenko), Soviet judge who had trouble writing a dissenting opinion because that was not done in Soviet law
+- [Interstate Highway System](https://en.wikipedia.org/wiki/Interstate_Highway_System), an American highway system
+- [Bushrangers](https://en.wikipedia.org/wiki/Bushranger), Australian escaped convicts turned outlaws
+- [Eyuwan Soviet](https://en.wikipedia.org/wiki/Eyuwan_Soviet), a Chinese government led by a rival of Mao Zedong
+- [Altai Mountains](https://en.wikipedia.org/wiki/Altai_Mountains)
+- [Republic of Genoa](https://en.wikipedia.org/wiki/Republic_of_Genoa)
+- [Moral rights](https://en.wikipedia.org/wiki/Moral_rights)
+- [Seaplane tender](https://en.wikipedia.org/wiki/Seaplane_tender), an early type of aircraft carrier
+- [List of cities founded by Alexander the Great](https://en.wikipedia.org/wiki/List_of_cities_founded_by_Alexander_the_Great)
+- [Microcode](https://en.wikipedia.org/wiki/Microcode), translates machine code into CPU operations
+- [Army of Sambre and Meuse](https://en.wikipedia.org/wiki/Army_of_Sambre_and_Meuse), a French revolutionary army
+- [Governor General of Canada](https://en.wikipedia.org/wiki/Governor_General_of_Canada)
+- [European Convention on Human Rights](https://en.wikipedia.org/wiki/European_Convention_on_Human_Rights)
+- [Túpac Amaru](https://en.wikipedia.org/wiki/T%C3%BApac_Amaru)
+- [Fossil](https://en.wikipedia.org/wiki/Fossil)
+- [Vi](https://en.wikipedia.org/wiki/Vi), a text editor
+- [Scheme (programming language)](https://en.wikipedia.org/wiki/Scheme_%28programming_language%29), a lisp dialect
+- [Trans-Neptunian object](https://en.wikipedia.org/wiki/Trans-Neptunian_object)
+- [Brackish water](https://en.wikipedia.org/wiki/Brackish_water), salty, but not that salty
+- [May Days](https://en.wikipedia.org/wiki/May_Days), infighting between the Spanish Republican faction
+- [List of HTTP header fields](https://en.wikipedia.org/wiki/List_of_HTTP_header_fields)
+- [Bo Xilai](https://en.wikipedia.org/wiki/Bo_Xilai), disgraced Chinese politician
+- [Zapatista Army of National Liberation](https://en.wikipedia.org/wiki/Zapatista_Army_of_National_Liberation)
+- [Materiel](https://en.wikipedia.org/wiki/Materiel)
+- [Mindanao](https://en.wikipedia.org/wiki/Mindanao), Filipino island
+- [Autonomous communities of Spain](https://en.wikipedia.org/wiki/Autonomous_communities_of_Spain)
+- [Ba'ath_Party](https://en.wikipedia.org/wiki/Ba'ath_Party)
+- [Ink wash painting](https://en.wikipedia.org/wiki/Ink_wash_painting)
+- [Dahomey](https://en.wikipedia.org/wiki/Dahomey), West African kingdom
+- [Palmer Raids](https://en.wikipedia.org/wiki/Palmer_Raids), a series of raids by the US government to deport suspected leftists
+- [Stingray phone tracker](https://en.wikipedia.org/wiki/Stingray_phone_tracker), which mimicks a cell tower
+- [Vettius Agorius Praetextatus](https://en.wikipedia.org/wiki/Vettius_Agorius_Praetextatus), 4th century Roman pagan aristocrat and high-ranking priest
+- [Inca architecture](https://en.wikipedia.org/wiki/Inca_architecture)
+- [Matteo Ricci](https://en.wikipedia.org/wiki/Matteo_Ricci), one of the most important priests of the Jesuit missions in China
+- [Pop art](https://en.wikipedia.org/wiki/Pop_art)
+- [Battle of Actium](https://en.wikipedia.org/wiki/Battle_of_Actium), where Octavian defeats Mark Antony
+- [Favourite](https://en.wikipedia.org/wiki/Favourite), a close companion to a ruler
+- [Heshen](https://en.wikipedia.org/wiki/Heshen), an extremely corrupt Chinese imperial official who amassed the equivalent of 270 billion USD
+- [List of richest Americans in history](https://en.wikipedia.org/wiki/List_of_richest_Americans_in_history)
+- [Proscription](https://en.wikipedia.org/wiki/Proscription), government degree declaring one an enemy of the state, originating in the late Roman Republic
+- [Cato the Younger](https://en.wikipedia.org/wiki/Cato_the_Younger), Roman politician
+- [Cicero](https://en.wikipedia.org/wiki/Cicero), prolific writer and Roman politician
+- [Moro people](https://en.wikipedia.org/wiki/Moro_people)
--- a/static/images/captcha.png
+++ b/static/images/captcha.png