An overview of the natechoe.dev architecture
To colleges: This PDF can be viewed as a blog post on my website at https://natechoe.dev/blog/2023-12-28.html
To everybody else: I'm sending this article to colleges, and uploading it to the website to get a free blog post. This website also has some easter eggs that I'd like to reveal to schools but not to the world, so when you get to those parts just close your eyes and scroll down.
Part 1: A high level overview
+-natechoe.dev------------------------------------------------------+
| |
| +-----------------+ +-----------------+ |
| | raw page content| | C library | |
| +-----------------+ |(loaded by swebs)| |
| | +-----------------+ |
| | | |
+------------------------|---------------------------|--------------+
| |
+-container entrypoint---|---------------------------|--------------+
| | | |
+------------+ | 1. compiles | | 3. generates |
| ncdg |---------------------------| | |
+------------+ | v v |
| +---------------------------+ +--------------------+ |
| |cooked, static page content| |dynamic page content| |
| +---------------------------+ +--------------------+ |
+------------+ | 2. hosts | | |
| swebs |---------------------------| | |
+------------+ | | | |
| v v |
| +--------------------------------+ |
| | A very nice website | |
| +--------------------------------+ |
| |
+-------------------------------------------------------------------+
natechoe.dev is made of a bunch of parts, all working together.
- ncdg (natechoe.dev generator, stylized in all lowercase) compiles the website from the ncdg language to HTML
- swebs (simple web server) hosts that static page content
- swebs also loads a C library at runtime and dynamically generates some resources on request
Part 2: A low level overview
Here's the code that makes all of this work:
natechoe.dev:Dockerfile:1-11
FROM natechoe/ncdg AS ncdg
FROM natechoe/swebs
RUN apt-get update -y --allow-releaseinfo-change && apt-get upgrade -y && apt-get install -y make gcc
RUN [ "rm", "-rf", "/site" ]
COPY --from=ncdg /usr/bin/ncdg /usr/bin/ncdg
COPY ./site /site
RUN mkdir /secrets && mkdir /core && chmod 777 /core
# core is for core dumps
ENTRYPOINT /site/start.sh
natechoe.dev:site/start.sh:1-4
#!/bin/sh
make
swebs -s /site/sitefile -o /dev/stdout
natechoe.dev:site/Makefile:1-24
SRCS = $(shell find -name *.ncdg)
BLOG = $(shell find site/blog | grep -E "[0-9]{4}-[0-9]{2}-[0-9]{2}")
HTML = $(subst .ncdg,.html,$(SRCS))
all: $(HTML) site/sitemap.txt library.so site/blog/posts
library.so: library.c
$(CC) $< -o $@ -shared -ansi -O2 -Wall -Wpedantic
site/blog/posts: site/blog/create-posts.sh $(BLOG)
cd ./site/blog
./site/blog/create-posts.sh
site/sitemap.txt: $(SRCS)
find -name *.ncdg | sed "s/ncdg$$/html/g" | sed "s/^\.\/site/https:\/\/natechoe.dev/g" > $@
echo "https://natechoe.dev/" >> $@
site/blog/index.html: site/blog/index.ncdg site/blog/posts
ncdg site/blog/index.ncdg site/blog/index.html
%.html: %.ncdg
ncdg $< $@
.PHONY: all
The Dockerfile will install swebs and ncdg, copy the raw page content, and set start.sh
as the entrypoint. Then, start.sh
will run a Makefile (which compiles a C library and generates HTML) and run swebs.
Part 3: ncdg
ncdg (natechoe.dev generator)is a very small text preprocessor that compiles to HTML. ncdg will process text based on the following four directives (technically five but one's obsolete), removing any whitespace in the process:
-
Include:
natechoe.dev:site/site/index.ncdg:1 @%/site/head.html@
will read and place
/site/head.html
into the file. Every page in natechoe.dev has this template:@%/site/head.html@ @=header The title of this page, eg "An overview of the natechoe.dev architecture"@ <p>The HTML that makes up this page.</p> @%/site/tail.html@
The included files contain some HTML boilerplate, CSS, and UI elements shared between pages.
-
Variables:
natechoe.dev:site/site/blog/index.ncdg:1-3 @%/site/head.html@ @=header Welcome to my awesome blog!@ @=title The blog@
natechoe.dev:site/head.html:1-18 <!DOCTYPE html> <html> <head> <meta charset=utf-8> <meta name=viewport content=width=device-width,content-scale=1> <link rel=stylesheet href=/autogen/colors.css> <link rel=stylesheet href=/resources/style.css> <title>natechoe.dev - @!title,header@</title> </head> <body> <header> <a href=@!diffdomain@/>natechoe.dev</a> <a href=@!diffdomain@/blog/index.html>The blog</a> <a href=@!diffdomain@/info/index.html>Contact info</a> <a href=https://github.com/NateChoe1/natechoe.dev>The github repo</a> </header> <div id=content> <h1>@!header@</h1>
site/site/blog/index.ncdg
defines two variables: the page header and the title of the webpage.site/head.html
will then read those variables and write the proper HTML accordingly. Note that there are some fallbacks here. For example, if ncdg can't find a "title" variable in thetitle
block, it will fall back to the "header" variable. -
Automatic HTML encoding:
<pre><code class=block>@\ echo "hello world" > file.txt echo "how are you doing?" >> file.txt @</code></pre>
turns into
echo "hello world" >; file.txt echo "how are you doing?" >;>; file.txt
The @\ directive automatically encodes HTML. It's mainly used for code blocks on this site.
-
Shell scripting:
natechoe.dev:site/site/blog/index.ncdg @$ /site/create-hub.sh /site/site/blog/posts "blog posts"@
will execute
/site/create-hub.sh
. This is used to create the index file for my blog. I don't want to manually create a directory of all my blog posts, so I call thecreate-hub.sh
script and it does all the work for me.
The simplicity of ncdg allows me to compile to HTML, so I don't need any complex dynamic resource generation. It's not like there isn't any dynamic resource generation at all though...
Part 4: swebs and that C library
swebs (simple web server, also stylized in all lowercase) has a config system called "sitefile".
natechoe.dev:site/sitefile:1-32
declare TCP 80
timeout 10000 80
set port 80
define library /site/library.so
set host www\\.natechoe\\.dev
set type text/html
read / /site/site/wrong.html
throw .* 404
set host .*
set type text/css
linked /autogen/colors.css
read .*\\.css /site/site/
set type text/html
read / /site/site/index.html
read .*\\.(html|ncdg) /site/site/
set type text/plain
read .*\\.txt /site/site/
read /info/public.key /site/site/info/public.key
set type image/png
read .*\\.png /site/site/
set type text/javascript
read .*\\.js /site/site/
# Ew javascript.
A sitefile is a series of commands that each request will go through. The define
command will set a global variable that swebs understands. The set
command will set some condition that future commands have to follow. For example, set host www\\.natechoe\\.dev
says "only accept requests with a host value that matches the regular expression www\\.natechoe\\.dev
".
Then there's resources. read / /site/site/wrong.html
says "when /
is requested, respond with the data at /site/site/wrong.html
". This is an easter egg with my website. When you visit https://www.natechoe.dev rather than https://natechoe.dev, you get a message saying that you're in the wrong place.
This is great, but we don't have dynamic pages. That's where the "linked" resource comes in. We can load a shared object file (a C library) into our program that will dynamically generate web pages.
natechoe.dev:site/library.c:1-60
#include <time.h>
#include <stdio.h>
#include <string.h>
#include <swebs/util.h>
#include <swebs/swebs.h>
static long currday = -1;
static char buff[300];
static int currlen;
static int getcolors(Request *request, Response *response);
int getResponse(Request *request, Response *response) {
if (strcmp(request->path.path.data, "/autogen/colors.css") == 0)
return getcolors(request, response);
response->type = DEFAULT;
return 404;
}
static int getcolors(Request *request, Response *response) {
long realday;
{
time_t currtime;
const time_t reference = 1655182800;
/* Midnight of the day of implementation in CST */
const int perday = 86400;
/* Seconds per day */
currtime = time(NULL);
if (currtime == -1)
realday = 0;
else
realday = (currtime - reference) / perday;
}
if (currday != realday) {
int color;
const int initial = 203;
/* The initial color at the time of implementation */
color = (realday + initial) % 360;
currday = realday;
currlen = sprintf(buff,
":root{"
"--backcol:hsl(%d,93%%,84%%);"
"--doccol:hsl(%d,92%%,75%%);"
"--shadowcol:#444444;"
"--codeback:#d3d3d3;"
"--codecol:#000000;"
"--barcol:hsl(%d,96%%,68%%);"
"--textcol:#000000;"
"} /*%ld %ld*/", color, color, color, currday, realday
);
}
response->type = BUFFER_NOFREE;
response->response.buffer.data = buff;
response->response.buffer.len = currlen;
return 200;
}
This is another easter egg. The C library dynamically generates requests to /autogen/colors.css
, which defines the colors used in the website. The specific hues chosen change by one degree on the color wheel every day, repeating every 360 days.
Part 5: More about swebs
swebs has a strange architecture inspired by nginx. We've seen how it's configured through sitefiles, but we haven't seen how it processes requests internally.
Like nginx, swebs has a multi-processing architecture with two types of processes: the main process and the runner processes. The main process will accept any connections and respawn dead runner processes. It has an incredibly simple event loop.
swebs:src/main.c:179-203
for (;;) {
createLog("poll() started");
if (poll(pollfds, site->portcount, -1) < 0) {
if (errno == EINTR)
continue;
createErrorLog("You've majorly screwed up. Good luck",
errno);
exit(EXIT_FAILURE);
}
createLog("Accepted stream");
for (i = 0; i < site->portcount; ++i) {
if (pollfds[i].revents & POLLIN) {
int j, lowestproc, fd;
fd = acceptConnection(listeners[i]);
lowestproc = 0;
for (j = 0; j < processes - 1; j++)
if (pending[j] < pending[lowestproc])
lowestproc = j;
sendFd(fd, runners[lowestproc].fd, &i, sizeof i);
close(fd);
}
}
}
Let's just ignore the vague error messages for a bit and talk about what this code is actually doing. We've got a bunch of server sockets in the pollfds
variable, and we're waiting for any of them to get a connection. Once we get one, we accept it and send that connection to the least-busy runner process.
The runner process has this slightly longer event loop:
swebs:src/runner.c:108-160
for (;;) {
pollConnList(&conns);
createFormatLog("poll() finished with %d connections",
conns.len);
for (i = 1; i < conns.len; i++) {
if (conns.fds[i].revents & POLLIN) {
createFormatLog("Connection %d has data", i);
if (updateConnection(conns.conns + i, site)) {
freeConnection(conns.conns + i);
removeConnList(&conns, i);
--i;
}
}
}
if (conns.fds[0].revents & POLLIN) {
Stream *newstream;
Connection newconn;
int portind;
struct pollfd newfd;
createLog("Main fd has data");
newfd.fd = recvFd(connfd, &portind, sizeof portind);
if (newfd.fd < 0) {
createLog("Message received that included an invalid fd, quitting");
exit(EXIT_FAILURE);
}
newfd.events = POLLIN;
newstream = createStream(contexts[portind],
O_NONBLOCK, newfd.fd);
if (newstream == NULL) {
createLog(
couldn't be created from file descriptor");
shutdown(newfd.fd, SHUT_RDWR);
close(newfd.fd);
continue;
}
if (newConnection(newstream, &newconn, portind)) {
createLog("Couldn't initialize connection from stream");
continue;
}
if (addConnList(&conns, &newfd, &newconn)) {
freeConnection(&newconn);
continue;
}
pending[id]++;
}
}
Each runner process just waits for data. When it receives data on a connection, it will process that data, and then remove that connection if necessary. If a runner receives a connection from the main process, it will add that connection to the list. This is a pretty simple architecture, but because we're not spawning any new threads with each connection the number of concurrently running jobs is O(1). Multithreading is for noobs, real programmers do concurrency manually.
Part 6: The surrounding architecture
When I first wrote swebs and bought natechoe.dev, I couldn't host it directly from my house because my Dad's websites were already hosted there and we only had a single public IP address. Around that time, though, my Dad discovered this thing called Docker, so we set up a dockerized reverse proxy and I spun up swebs on a virtual machine. Then I dockerized natechoe.dev a few months later, then I set up email a couple months after that using a service called docker-mailserver. Then at some point I switched from Google Domains to Porkbun and needed DNSSEC. I didn't want to use Cloudflare's DNS for some unforgettable reason that I can't quite remember, so I set up a BIND9 docker image from my house.
Then I built a 24 core computer. My Dad and I wanted to move everything there. For some reason he hosts his websites on this obscure operating system mostly used for gaming called "Windows"? Anyways, I needed a virtual machine, but the only dockerized QEMU I could find didn't have VNC or QCOW2 support, so I wrote it myself.
I should note that my Dad had no influence on this system other than helping to set up the reverse proxy and buying the computer parts. I set up DNS, email, built the computer, and migrated everything myself.