How I built this site

2025-03-10 - Tags: perl

I was originally going to use a CMS to manage this website, but I had to give up on that. Throughout the week, the only Internet-connected device I have on me is a phone, and the Web-based interfaces of most online tools don't work well on such a tiny screen. After fiddling with Grav and WordPress, I decided to put Termux on my phone and write a static site generator instead.

Why write one? Because I find the vast majority of the existing generators too bloated, too difficult to use, and too much of a pain to set up. Just take my Markdown files, apply a handful of templates, and give me a pile of static HTML files. I don't care about image boxes, share buttons, Sass, SEO, annoying ads, tracking pixels, or advanced analytics. And I certainly don't want my pages turned into <div> soup because that's the way some framework author thought a page should look. I was also looking for a fun weekend project.

Gathering the pieces

So I gathered the required components and got to work. The first thing I looked for was a template engine. Mustache was towards the top of the list because of the large number of implementations in various programming languages. But then I found The Template Toolkit, which has one of the best syntaxes I've ever seen in a template engine. It allows for both wrapping and injecting template code, and variables can be defined in the templates themselves and passed to other templates. The documentation is pretty good too.

The fact that it was a Perl library pretty much dictated the language I would use for the project. Perl gets a bad reputation because it's flexible syntax allows for some pretty awful-looking code. It's also easy to shoot yourself in the foot if you're not careful. But, if one takes the time to actually write comments and not try to write an entire program on one line, it's actually a very fast and stable language. Sure beats the pants off writing Bash scripts anyway.

The next component I needed was a Markdown parser. After trying a few libraries on CPAN, I chose Markdown::Perl for its nice set of features.

The final major component I needed was a YAML parser. With a quick search I found YAML::XS.

Two more libraries, File::Copy::Recursive and File::Find provided some nice utility functions. From there, it only took a bit of trial and error to write a program in less than 400 SLOC to do what I wanted.

As an added bonus, I decided to add Plack::App::Directory to provide a local testing server. Of course I could have used darkhttpd or python -m http.server, but I figured that since the rest of my "solution" was in Perl, I may as well use a testing server written in Perl. That, and I wanted to play with Plack a bit.

I'm calling the resulting program Robert's Own Generator and I will eventually get around to posting it on my site so that everyone can laugh at my bad code.

The rest of this article goes over the functions and my thought process.

Deciding on a structure

Static site generators generally come in two flavors. The first flavor is the kitchen sink variety that comes with a bunch of commands. They often have a new xxx command that creates a project directory and copies a bunch of files into it. The second flavor is download and modify. That's where you clone a repository or download an archive, modify some variables, and run the program. The second flavor is what I decided to do.

Let's look at the directory layout:

|- content
|- output
|- static
|- templates

That's right, no dedicated theme directory. Instead, all of the templates go in one directory, and all of the static assets (icons, JS libraries, etc) go in another directory. Content lives in content/ and output is sent to output/. Then the program itself lives in the same directory and uses relative paths.

With the directories set up, I started writing the build script. With most scripts I write, I like to import all of the dependencies at the top of the file. Every language has a different keyword for this. For Perl, it's use.

use strict;
use warnings;
use v5.12;

use Data::Dumper;
use List::Util;

use YAML::XS qw(Load Dump);
use File::Copy::Recursive qw(fcopy rcopy dircopy fmove rmove dirmove);
use File::Find;
use Template;
use Markdown::Perl;

After that, I set up the dependencies to access their functions. The code required to do this varies widely from one library to the next.

# Consult the docs for Markdown::Perl before changing.
my $md_mode = "github";
my %md_options = (
  parse_file_metadata => "none",
  multi_lines_setext_headings => "ignore"
);

# Setup the template toolkit
my $tt = Template->new({
  INCLUDE_PATH => 'templates',
  INTERPOLATE  => 0,
}) || die "$Template::ERROR\n";

Writing and reading files

Reading and writing files with Perl may take a bit more effort than other languages, but it's not that hard.

Writing files was easy.

sub write_file {
  my $text = shift;
  my $outpath = shift;
  open my $fh, '>', $outpath or die "Can't open output file $outpath! $!";
  print $fh "$text\n";
  close $fh or die "Can't close file $outpath! $!";
}

Slurping required a bit more thought. "Slurping" is the act of reading the entire content of a file into one multi-line string. I saw a few libraries on CPAN that claimed to make this easy, but I thought slurping was the kind of thing Perl should be able to do without a lot of code. So I rummaged through the docs with the perldoc command and eventually came up with this:

sub slurp_file {
  my $path = shift;

  # Open the file for reading as Unicode characters.
  open my $fh, '<:encoding(UTF-8)', $path
    or die "Could not read file at $path! $!";

  # The input record separator influences what Perl's idea of a "line" is.
  # This sets the <> to "slurp mode".
  local $/;

  # Which makes this read the whole file into a string.
  my $contents = <$fh>;

  # Don't forget to close the file!
  close $fh or die "Could not close file at $path! $!";

  # Return the contents.
  return $contents;
}

I remember reading somewhere that specifying an encoding was important, so that's what I did here. Since I don't ever intend to use the program on non-UTF-8 systems, this should be fine and it works as expected.

Copying the files

With all the basics covered, I began writing the main build function. I figured that the first step should be to just copy the contents of content/ to output/. Then I could reach into output/, transform only the Markdown files into HTML, and leave the rest alone. The easiest way to do that was the dircopy function from File::Copy::Recursive. Then copying files was as simple as this:

dircopy("content", "output");

Getting the list of Markdown files

The next step was to build a list of Markdown files. After a bit of trial and error, I came up with this beauty:

sub find_recurse {
  my ($pattern, @dirs) = @_;

  # If the directories aren't specified, use only the current one.
  if (!@dirs) {
    @dirs = (".");
  }
 
  # This is where the list of paths will go
  my @ret = ();
  
  # Indented to enhance clarity
  find(
    # This is basically a lambda, a function without a
    # name. It runs on each element in the @dirs array,
    # and if the pattern is found, it adds the path to
    # @ret.
    sub {
      # The $_ variable is implied here.
      if (/$pattern/) {
        push(@ret, $File::Find::name)
      }
    },

    # List of directories to search
    @dirs
  );

  return @ret;
}

Which gets called like this:

my @mdFiles = find_recurse(qr/\.md$/, "output");

Yes, Python and Ruby programmers will look at that and comment on how cursed it looks, but I've seen much worse in C. To the uninitiated, Perl uses prefixed symbols to tell different kinds of variables apart.

@ is for array
% is for hash
$ is for scalar

Perl also allows accessing variables in different contexts. For example, you refer to the entire array with @array_name, but single elements with $array_name[index]. The parts that look like /this/ are regular expressions, or just "regex".

Also featured in the find_recurse function above is the strange $_ variable. The docs call it the "default input and pattern-searching space" variable. It's a bit like the English word "it", and it's implied more than it's actually written out.

Stripping and parsing front-matter

Once I got the stack of files, the next step was to separate the front-matter (page metadata) from the content. With most static site generators, the metadata is written in YAML and sits at the top of the file between a pair of triple-delimiter lines like this:

---
title: Title here
date: 2025-01-01
tags:
 - tag 1
 - tag 2
---

Content here.

I decided to do the same thing, but omit the top line of delimiters. Oddly enough, some of the Markdown parsers I looked at render the metadata as part of the content. To prevent this, I wrote a little function to both isolate the front-matter and remove it from the text.

sub split_front_matter {
  my $text = shift;
  
  # Split text into lines
  my @lines = split("\n", $text);

  my @metaLines = ();
  my @contentLines = ();

  # Send lines to @metaLines by default...
  my $fenceOpen = 0;

  foreach (@lines) {
    # ...until we see the triple dashes.
    # Setting to 1 routes the remaining lines to @contentLines.
    if ($_ eq "---" && $fenceOpen == 0) {
      $fenceOpen = 1;
      next;
    }

    if ($fenceOpen == 0) {
      push(@metaLines, $_);
    } else {
      push(@contentLines, $_);
    }
  }
  
  # Join yaml and content
  my $pageYaml = join("\n", @metaLines);
  my $pageContent = join("\n", @contentLines);

  # Return as an array
  my @splitPage = ($pageYaml, $pageContent);
  return @splitPage;
}

So, this function returns a pair of multi-line strings: one for the YAML, and one for the Markdown, in that order. Here's what running the function looks like:

my @splitText = split_front_matter($text);
my $yaml = $splitText[0];
my $md = $splitText[1];

Generating the page hash

Step the next was turning the YAML into a hash I could shove into an array.

# Generate the page hash by parsing the yaml.
my $pageHashRef = Load($yaml);

# Deref
my %pageHash = %{$pageHashRef};

Interestingly, the Load function returns a hash reference, not the hash itself. To get that, I had to dereference it. References in Perl can be tricky. In general, a reference is a scalar variable that holds the name of another value. To get the value itself, you have to know what it is, and tell Perl to transform it. This is one area where Ruby and other languages have an advantage when dealing with nested data structures.

After I got the hash, it was time to render the Markdown and add it.

my $content = Markdown::Perl::convert($md);
# Add the content
$pageHash{'content'} = $content;

Configuration and templates

Instead of using a separate config file for site variables, I decided to just use a hash.

my %site = (
  url => 'https://www.rjkcodes.site',
  testUrl => 'http://localhost:5000',
  author => 'RJK',
  name => 'RJK Codes'
);

my @menu = (
  {
    name => "About",
    url => "about.html"
  },
  {
    name => "Contact",
    url => "contact.html"
  }
);

$site{'menu'} = \@menu;

The testUrl above is for Plack. Another thing that may require explaining is the \@ bit. With Perl, you can't just ram an array (or a hash) into a hash. You have to use a reference. That's what the \ is for.

And here's what that looks like in a template.

...
    <header id="site-header">
      <a href="[% site.url %]/index.html" id="site-name">[% site.name %]</a>
      <nav id="site-nav">
        <ul>
          [% FOREACH item IN site.menu %]
          <li><a href="[% site.url %]/[% item.url %]">[% item.name %]</a></li>
          [% END %]
        </ul>
      </nav>
    </header>
    <main>
...

The stuff between [% and %] is the Template Toolkit syntax. Here's the template rendering function.

sub render_template {
  # Takes a page hash, a site hash, and a list of pages.
  my %page = %{$_[0]};
  my %site = %{$_[1]};
  my @pageList = @{$_[2]};
  unless (exists $page{'template'}) {
    $page{'template'} = "page.html";
  }

  # TT needs a wrapper hash
  my %tvars = (
    site => \%site,
    page => \%page,
    pageList => \@pageList
  );


  # Need a REF or we get an error
  my $output = "";
  $tt->process($page{'template'}, \%tvars, \$output) || die $tt->error(), "\n";
  return $output
}

So this way this works is every page needs to have a template key in its metadata. If not, the default template page.html is added. I also wrapped the site variables, page variables, and the entire page list into one big hash. This makes the top-level keys site, page, and pageList available for use in templates.

CLI

My generator has two commands: build and publish. Both commands rebuild the site, but "publish" uses the proper site URL, and "build" uses the test URL. I wanted to add a "serve" command, but Plack seems to require that code to be in its own file.

As for parsing the commands, I wrote another function. Again, there are libraries to help with this on CPAN, but I wanted a simpler solution.

# Let's process input from the command line
sub print_help {
  say "#+- Robert's Own Generator v0.1 -+#\n";
  say "Avaialble commands:";
  say "build    -- Build the site with a LOCAL url for testing.";
  say "publish  -- Build the site with the PROP URL for publishing to a server.\n";
  say "To view a site locally WITHOUT Plack, either start the desired server on ";
  say "port 5000, or change the 'testUrl' key in %site to whatever you require.";
}

my $argLength = scalar @ARGV;

if ($argLength != 1) {
  print_help;
} else {
  if ($ARGV[0] eq "build") {
    say "Building site with LOCAL URLs for testing.";
    build_site("no");
    say "DONE";
  } elsif ($ARGV[0] eq "publish") {
    say "Building site with PROPER URLs for publishing.";
    build_site();
  } else {
    say "Unknown command!";
    print_help;
    say "DONE";
  }
}

Main build function and tag page generation

I should have probably split tag page generation off into a separate function, but whatever. Here's the code.

sub build_site {
  my $publish = shift;

  # If we're not publishing, replace the URL for local testing.
  if (defined $publish) {
    if ($publish eq "no") {
      $site{'url'} = "$site{'testUrl'}";
    }
  }

  # Copy everything to the output directory.
  # This way, we grab images and other page assets.
  dircopy("content", "output");

  # Find all the Markdown files
  my @mdFiles = find_recurse(qr/\.md$/, "output");

  # Build the page list
  my @pageList = ();

  foreach (@mdFiles) {
    my $text = slurp_file($_);
    my %phash = gen_page_hash($text);

    # Generate the outpath
    my $outpath = $_ =~ s/content/output/r;
    $outpath =~ s/\.md/\.html/;

    $phash{'outpath'} = $outpath;

    # Fix the URL
    my $url = $outpath =~ s/output/$site{'url'}/r;
    $phash{'url'} = $url;

    push(@pageList, \%phash);

    # Delete the source file
    unlink($_);
  }

  @pageList = sort_pages_date(@pageList);
  
  # Generate tag pages
  # Start with the master tag list
  my @siteTags = ();

  foreach (@pageList) {
    my %p = %{$_};

    if (exists $p{'tags'}) {
      # Deref
      my @tags = @{$p{'tags'}};

      foreach (@tags) {
        push(@siteTags, $_);
      }
    }
  }

  # Remove duplicates
  @siteTags = List::Util::uniq @siteTags;

  # Add it to the hash
  $site{'tags'} = \@siteTags;

  # Add a link to the menu
  my %menuItem = (
    name => "Tags",
    url => "tags.html"
  );
  push(@{$site{'menu'}}, \%menuItem);

  # Write the tag page
  # The date and content don't really matter. We put them there to prevent
  # errors.
  my %tagListPageHash = (
    template => "tagListPage.html",
    title => "Tags",
    content => "",
    date => "2025-03-03"
  );
  my $tagListPageText = render_template(\%tagListPageHash, \%site, \@pageList);
  write_file($tagListPageText, "output/tags.html");

  # Create the tags directory
  mkdir("output/tags");

  # Now we iterate over the tags and generate the pages.
  # The template does most of the heavy lifting.
  foreach (@siteTags) {
    my %tagPageHash = (
      template => "tagPage.html",
      title => "$_",
      tag => "$_",
      content => "",
      date => "2025-03-03"
    );
  
    my $tagPageText = render_template(\%tagPageHash, \%site, \@pageList);
    write_file($tagPageText, "output/tags/$_.html");
  }

  # Now we iterate over the page list and write all the files.
  foreach (@pageList) {
    my %pageHash = %{$_};

    # Render template
    my $text = render_template(\%pageHash, \%site, \@pageList);

    # Write the output
    my $outpath = $pageHash{'outpath'};
    write_file($text, $outpath);
  }


  # Render the templates for the  main JS and CSS files.
  # Because of the global nature of these files, it doesn't make sense to
  # provide anything more than the site hash.
  my %tvars = (
    site => \%site,
  );
  my $js = "";
  $tt->process("main.js", \%tvars, \$js) || die $tt->error(), "\n";
  write_file($js, "output/main.js");
  
  my $css = "";
  $tt->process("main.css", \%tvars, \$css) || die $tt->error(), "\n";
  write_file($css, "output/main.css");

  # Copy all static assets.
  dircopy("static", "output");
}

This is still a work in progress. I will post the full program soon.