Simple NGINX Authentication Hack with Bottle

I want to share a scenario I ran into and a quick hack to solve it: I administer a system on the internet, which hosts some private git repositories for friends, using Gitea, with NGINX being used as outward facing web server (between Gitea and the internet). For a while this worked fine, as there were no other pages or applications hosted on this server, and I could rely strictly on Gitea’s local authentication and user management.

I ran into a problem recently, however, in that Gitea doesn’t seem to have many real user-authoring or wiki-like features built in, and there was a need to add a wiki to enhance collaboration on a shared project. I really like the lightweight Oddmuse wiki software, but by default it doesn’t ship with authentication built in, and I really wanted a single unified system of authentication for this server.

I decided I was OK utilizing HTTP basic auth (which is pretty secure so long as your connections are all HTTPS). A very common way to make HTTP basic auth work is utilizing “htpasswd” files (I believe these originated with Apache HTTPD, but have been long supported in NGINX and Lighttpd, among other webservers). This works OK sometimes, but Gitea stores authentication data differently and with different hash formats (in its own database), and in general I’ve found that keeping these updated and synchronized is hard. If someone wanted to reset their password, you need to manually go update the htpasswd file, or have invent some other way to handle this (usually ugly). You can read more about htpasswd style authentication for NGINX here.

Another typical choice for adding authentication to web servers is to utilize LDAP. While this is a very complete and robust solution, I have found LDAP to be an absolute nightmare to setup and administer (or even understand), and it feels relatively heavy-weight for a scenario such as this. For a larger group of people or many servers, this is likely appropriate, but not what I want to use here, as I value my time enough to not go figure out all of the complexity of LDAP again.

At this point, I wanted to see how Gitea stores its users and authentication data. I had initially thought to write an NGINX extension in C if I could figure out how Gitea manages users and authentication, and use this for authentication. I utilize a SQLite3 database with Gitea, as the system is relatively low volume. Enumerating the tables Gitea has in its database (typically stored at /var/lib/gitea/data/gitea.db if you’re using SQLite3) using the handy sqlite3 command line tool  yields the following:

.sqlite> .tables
access                     oauth2_grant             
access_token               oauth2_session           
action                     org_user                 
attachment                 protected_branch         
collaboration              public_key               
comment                    pull_request             
commit_status              reaction                 
deleted_branch             release                  
deploy_key                 repo_indexer_status      
email_address              repo_redirect            
external_login_user        repo_topic               
follow                     repo_unit                
gpg_key                    repository               
gpg_key_import             review                   
hook_task                  star                     
issue                      stopwatch                
issue_assignees            task                     
issue_dependency           team                     
issue_label                team_repo                
issue_user                 team_unit                
issue_watch                team_user                
label                      topic                    
lfs_lock                   tracked_time             
lfs_meta_object            two_factor               
login_source               u2f_registration         
milestone                  upload                   
mirror                     user                     
notice                     user_open_id             
notification               version                  
oauth2_application         watch                    
oauth2_authorization_code  webhook

So, there are many tables here, but it turns out (for local authentication) the user table has pretty much what we need. Here is the schema for the user table:

sqlite> .schema user
CREATE TABLE `user` (
  `id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, 
  `lower_name` TEXT NOT NULL, 
  `name` TEXT NOT NULL, 
  `full_name` TEXT NULL, 
  `email` TEXT NOT NULL, 
  `keep_email_private` INTEGER NULL, 
  `email_notifications_preference` TEXT DEFAULT 'enabled' NOT NULL, 
  `passwd` TEXT NOT NULL, 
  `passwd_hash_algo` TEXT DEFAULT 'pbkdf2' NOT NULL, 
  `must_change_password` INTEGER DEFAULT 0 NOT NULL, 
  `login_type` INTEGER NULL, 
  `login_source` INTEGER DEFAULT 0 NOT NULL, 
  `login_name` TEXT NULL, 
  `type` INTEGER NULL, 
  `location` TEXT NULL, 
  `website` TEXT NULL, 
  `rands` TEXT NULL, 
  `salt` TEXT NULL, 
  `language` TEXT NULL, 
  `description` TEXT NULL, 
  `created_unix` INTEGER NULL, 
  `updated_unix` INTEGER NULL, 
  `last_login_unix` INTEGER NULL, 
  `last_repo_visibility` INTEGER NULL, 
  `max_repo_creation` INTEGER DEFAULT -1 NOT NULL, 
  `is_active` INTEGER NULL, 
  `is_admin` INTEGER NULL, 
  `allow_git_hook` INTEGER NULL, 
  `allow_import_local` INTEGER NULL, 
  `allow_create_organization` INTEGER DEFAULT 1 NULL, 
  `prohibit_login` INTEGER DEFAULT 0 NOT NULL, 
  `avatar` TEXT NOT NULL, 
  `avatar_email` TEXT NOT NULL, 
  `use_custom_avatar` INTEGER NULL, 
  `num_followers` INTEGER NULL, 
  `num_following` INTEGER DEFAULT 0 NOT NULL, 
  `num_stars` INTEGER NULL, 
  `num_repos` INTEGER NULL, 
  `num_teams` INTEGER NULL, 
  `num_members` INTEGER NULL, 
  `visibility` INTEGER DEFAULT 0 NOT NULL, 
  `repo_admin_change_team_access` INTEGER DEFAULT 0 NOT NULL, 
  `diff_view_style` TEXT DEFAULT '' NOT NULL, 
  `theme` TEXT DEFAULT '' NOT NULL
);
CREATE UNIQUE INDEX `UQE_user_name` ON `user` (`name`);
CREATE UNIQUE INDEX `UQE_user_lower_name` ON `user` (`lower_name`);
CREATE INDEX `IDX_user_created_unix` ON `user` (`created_unix`);
CREATE INDEX `IDX_user_updated_unix` ON `user` (`updated_unix`);
CREATE INDEX `IDX_user_last_login_unix` ON `user` (`last_login_unix`);
CREATE INDEX `IDX_user_is_active` ON `user` (`is_active`);

Examining the schema, we see that the information we probably need to authenticate users is likely stored entirely in this table. Great! We probably want to pay attention to the name, passwd (probably the hash value), passwd_hash_algo, type, salt, is_active, and prohibit_login columns. A quick dump of the users yields user records such as:

sqlite> select * from user;
                            id = 1
                    lower_name = foobar
                          name = Foobar
                     full_name = Foo Bar
                         email = foo@bar.com
            keep_email_private = 0
email_notifications_preference = enabled
                        passwd = 056577a98e56c10f7084f2916c163785e409d3fb9f8f5251ec747f24d639f6ae73750f29da068a090ef24c4bfc115deb178c
              passwd_hash_algo = pbkdf2
          must_change_password = 0
                    login_type = 0
                  login_source = 0
                    login_name = 
                          type = 0
                      location = 
                       website = 
                         rands = CGkQd8yAmC
                          salt = oZM0lIBZQz
                      language = en-US
                   description = 
                  created_unix = 1574639131
                  updated_unix = 1574639131
               last_login_unix = 1574639131
          last_repo_visibility = 0
             max_repo_creation = -1
                     is_active = 1
                      is_admin = 0
                allow_git_hook = 0
            allow_import_local = 0
     allow_create_organization = 1
                prohibit_login = 0
                        avatar = c8af9bdacc70eceaade55fe2b572daa3
                  avatar_email = foo@bar.com
             use_custom_avatar = 0
                 num_followers = 999
                 num_following = 999
                     num_stars = 999
                     num_repos = 999
                     num_teams = 0
                   num_members = 0
                    visibility = 0
 repo_admin_change_team_access = 0
               diff_view_style = 
                         theme = gitea

A couple of things to notice here, is that this (fake) user and all of the users in the gitea database by default use the pbkdf2 hashing algorithm, which is fortunately relatively strong and pretty common (Python’s built-in hashlib comes with support for pbkdf2 out of the box). If you count the number of hex characters in the password string, you’ll notice it’s 100 characters long; 100 hex characters is equivalent to a length 50 byte string, so while the user table doesn’t indicate the hash value length explicitly, we can assume it’s probably generating 50 byte hashes. We see the salt in the record as well; the only questions now are how many rounds the hash algorithm does, and what is the base hashing algorithm used by pbkdf2 (this is often SHA-1 or SHA-256).

To answer the question of how many rounds, we may fortunately go examine Gitea’s source code. In the models directory of the git repository, under the user.go file, we see the hashPassword function on line 464:

func hashPassword(passwd, salt, algo string) string {
	var tempPasswd []byte

	switch algo {
	case algoBcrypt:
		tempPasswd, _ = bcrypt.GenerateFromPassword([]byte(passwd), bcrypt.DefaultCost)
		return string(tempPasswd)
	case algoScrypt:
		tempPasswd, _ = scrypt.Key([]byte(passwd), []byte(salt), 65536, 16, 2, 50)
	case algoArgon2:
		tempPasswd = argon2.IDKey([]byte(passwd), []byte(salt), 2, 65536, 8, 50)
	case algoPbkdf2:
		fallthrough
	default:
		tempPasswd = pbkdf2.Key([]byte(passwd), []byte(salt), 10000, 50, sha256.New)
	}

	return fmt.Sprintf("%x", tempPasswd)
}

We first see that the default password hashing algorithm is pbkdf2, which we expect. While the parameters on the pbkdf2.Key function aren’t totally explicitly enumerated, we can quickly guess (or look at the Go documenation for this function) that it always does 10000 rounds (since we strongly believe the pbkdf2 value length is always 50 bytes), and that it utilizes SHA-256 as the base hashing algorithm. This is excellent, as all of this is relatively straightforward to implement elsewhere.

Now, back to the initial problem of adding HTTP basic auth using the Gitea database in NGINX. We could write an NGINX module in C, but writing C for myself if often slow going, error prone, and more challenging than writing Python. Fortunately there is another way NGINX allows administrators to add authentication to their webservers: subrequest authentication. In a nutshell, to perform authentication, NGINX sends all or part of the incoming request to another web server or suburl, and the status code result of this request (either 2xx for valid authentication or 401/403 for bad authentication), is what NGINX uses to ascertain if the given authentication data was good or bad.

This means, if we can implement a very small web service on our host, which can read our HTTP basic auth data from incoming requests, search the gitea database for a matching user, check the incoming password against the stored hash, and return the correct status code, we’re probably golden. For challenges like this, I really love utilizing the Bottle web framework. One really strong reason to prefer Bottle for this, is that it is a single Python file, supports both Python 2 and 3, and has no outside requirements. This means so long as everything else comes from the Python standard library, we may just “vendorize” our copy of bottle, and forgo the need to either add/remove/alter global Python packages or utilize a Python virtualenv in our deployment.

In my Python code, the first thing I built out was the code to hash (and check) passwords the same way Gitea does. We know the passwords are all pbkdf2, use SHA-256, have a salt and use 10000 rounds of hashing, have a key length of 50 bytes, and are stored as hexadecimal values. Looking at the built-in hashlib module, we see the pbkdf2_hmac function, which does pretty much what we need; we can combine this with the “hexlify” function from the binascii module, as pbkdf2_hmac yields bytes instead of hexdigits. The code to generate hashes is thus:

import binascii
import hexlify

def do_gitea_pbkdf2(candidate_password, salt):
    hashed = hashlib.pbkdf2_hmac(
        'sha256', bytes(candidate_password, encoding='utf-8'),
        bytes(salt, encoding='utf-8'), 10000, 50
    )
    return binascii.hexlify(hashed).decode('ascii')

All that is needed further to validate the hash then is to compare it to the value in the database itself.

The next thing to do is to figure out how to retrieve the rows out of the database itself. We can use the pysqlite module to open and search the database. Since we want to find matching users in the database who are permitted to log in, we can use the following select statement:

SELECT * FROM user 
WHERE (lower_name = :un OR name = :un OR email = :un) 
  AND is_active = 1 
  AND type = 0 
  AND prohibit_login = 0

where the values starting with : will be used for parameter substitution later. We can use this statement with some Python glue code to perform that password checking, using our earlier do_gitea_pbkdf2 function:

from sqlite3 import dbapi2 as sqlite

# dict_factory used to return dictionaries
# instead of tuples from SQLite queries to
# ease getting specific column values later.
def dict_factory(cursor, row):
    d = dict()
    for idx, col in enumerate(cursor.description):
        d[col[0]] = row[idx]
    return d

def create_connection(database_url):
    # create a new SQLite3 connection 
    # with the dict row factory instead of the default factory.
    connection = sqlite.connect(database_url)
    connection.row_factory = dict_factory
    return connection

def check_pass(connection, username, passwd):
    cursor = connection.cursor()
    try:
        cursor.execute(
            "SELECT * FROM user WHERE (lower_name = :un OR name = :un OR email = :un)"
            " AND is_active = 1 AND type = 0 AND prohibit_login = 0",
            {"un": username.strip()}
        )
        result = cursor.fetchone()
        if result:
            if result['passwd_hash_algo'] == "pbkdf2":
                # If gitea used pbkdf2 to hash the password...
                if do_gitea_pbkdf2(passwd, row['salt'], debug=debug) == \
                        row['passwd']:
                    # The hash matches the incoming user password
                    return True
                else:
                    # The hash did not match the incoming user password
                    return False
            else:
                # Don't know how to hash this, just default to
                # not allowing the user to log in.
                # This could happen if bcrypt, etc. were used to
                # hash instead, but could be handled with more
                # code.
                return False
        else:
            # No such user in the database.
            return False
    finally:
        cursor.close()

We’re almost there! The last thing to do is to bring bottle in and make use of it. An easy (but perhaps not the most robust or best) way to do this is to “vendorize” it. You may simply create a “vendor” directory in your project, with an empty __init__.py file, and place the bottle.py file into this directory. You can now load bottle by using the following:

from vendor import bottle

This is sometimes not the nicest or best way to bring in packages (if you can, it’s usually better to use proper package management, probably with pip for python), but this can be an easy way to bring something in with minimal fuss.

Now that we have bottle, we can create a simple app that simply authenticates against the database and returns an appropriate status:

from vendor import bottle

def build_app(database_url):
    app = bottle.Bottle()
    connection = create_connection(database_url)
    
    # form a "partially applied" function 
    # as bottle expects a function that takes
    # only as username and password, but we also need
    # to feed in the connection as well.
    auth_partial = lambda un, pw: check_pass(connection, un, pw)    

    @app.route("/auth", name="auth_view")
    @bottle.auth_basic(auth_partial)
    def auth_view():
        return bottle.HTTPResponse(
            status=200,
            body="success"
        )

    return app

Excellent, that’s pretty much all we need. The full code for a minimal working application implementing this may be found here: https://github.com/cope-systems/bottle-gitea-auth-example.

All that is needed now is to clone the repository, set up an appropriate systemd (or similar) service file, and add the authentication to your NGINX setup. I chose to run my service on port 9091 (bound only to the local interface, 127.0.0.1). Once running all that’s needed to protect your NGINX site is the following:

server {
    ...
    auth_request /auth;

    location /auth {
           proxy_pass http://localhost:9091/auth;
           proxy_pass_request_body off;
           proxy_set_header Content-Length "";
           proxy_set_header X-Original-URI $request_uri;
    }

}

This should prompt all pages now to include HTTP basic auth, which will correspond to the user logins for your Gitea instance. This provides at least a minimal amount of security to anything else being servered on this web server, with relatively low hassle. This example could also be extended to work with many other forms of authentication (other databases/applications), and in general might provide a good alternative to LDAP and htpasswd files for many applications.

Questions or Comments? Post below.

A Design Pattern Idea For Python Micro Web Frameworks

I often find myself prototyping new ideas using either the Bottle or Flask “microframeworks”. While they do facilitate quick and flexible design, one of the usual design characteristics of both of these frameworks I dislike is the use of a global application objects and globally decorated functions as “views” for routes in both of these frameworks. An example of this style:

from bottle import Bottle

app = Bottle()

# ....

@app.route("/", name="index_view", method=["GET"])
def index_view():
    return "Hello!"

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8080)

For applications contained in a single file, with few or no outside dependencies (and likely few or no tests), this is OK. However, consider once this broken into two files, one main.py:


from bottle import Bottle
from views import *

app = Bottle()

if __name__ == "__main__":
    app.run(host="127.0.0.1", port=8080)

and views:

from main import app
import random

random.seed()

@app.route("/", name="index_view", method=["GET"])
def index_view():
    return "Hello!"

@app.route("/randomNumber", name="random_number", method=["GET"])
def random_number():
     return "Random Number: {0}".format(random.random())

Notice that we now depend on circular imports to load our views. Circular imports can often cause unexpected behavior, and are usually a design smell in an application (some good further discussion on circular imports in Python can be found here). The decorator and view function style, in which arguments only come from URL parameters, for Bottle and Flask (along with the global nature of the functions) also makes injecting objects like database connections difficult. In Flask, applications will often make use of the thread local reference to current_app in combination with the g attribute, by adding at will to this object, in order to get at dependencies like database connections. I find this often leads to messy and confusing applications, where it’s unclear when (or if) a dependency has or has not been added.

As an alternative, I had started by making my views as closures as a means to
restrict when the views are decorated, and provide a clear and easy means to add dependencies like database connections (views.py):

# Ideally your queries should not live in the same place as your views.
def select_thing_count(db_connection):
    # In a real application you would be doing a db query here.
    return {"thingCount": 400}

def create_index_view(app):
    @app.route("/", name="index_view", method=["GET"])
    def index_view():
        return "Hello!"
    return index_view

def create_status_view(app, db_connection):
    @app.route("/status", name="status_view", method=["GET"])
    def status_view():
        thing_count_dict = select_thing_count(db_conection)
        return "Thing count: {0}".format(thing_count_dict["thingCount"]
    return status_view

We can similarly alter main.py to remove the global application object, and instead construct the application inside of a function:

from bottle import Bottle
from views import create_status_view, create_index_view

def create_db_connection(db_connection_info):
    # Do your db specific stuff to instantiate a connection
    # instead of returning None
    db_connection = None
    return db_connection

def create_my_app(db_connection_info):
    db_connection = create_db_connection(db_connection_info)
 
    app = Bottle()
    create_index_view(app, db_connection)
    create_status_view(app, db_connection)
    return app

if __name__ == "__main__":
     # In a real application load your specific
     # DB connection data, probably from an argument parser or
     # similar.
     db_connection_info = {
        "host": "127.0.0.1", 
        "user": "foo", 
        "password": "bar", 
        "database_name": "baz"
     }
     my_app = create_my_app(db_connection_info)
     my_app.run(host="127.0.0.1", port=8080)

This significantly clarifies the application and view lifecycle, and makes it easier to understand what dependencies are present and where they come from (like our database connection). Additionally, dependency injection for tests becomes again easier to understand and do in this context. Using closures in this way works relatively well for small applications, but is a little clunky and gets painful when you have many related views that can be grouped together.

In order to further refine this, especially when there are many views, I use a similar pattern, using a class instead of a closure for similar effect (views.py):

# queries.py is omitted, but should be any 
# database queries, (or anything similar) you use for your views
from queries import load_all_things, load_all_users, load_user_by_id
from procedures import format_thing, format_user

class MyApplicationViews(object):
    def __init__(self, db_connection):
        self.db_connection = db_connection

    def attach_to_app(self, app):
        app.route(
            "/", name="index_view",
            method=["GET"], callback=self.index_view
        )
        app.route(
            "/things", name="get_all_things_view", 
            method=["GET"], callback=self.get_all_things_view
        )
        app.route(
            "/users", name="get_all_users_view", 
            method=["GET"], callback=self.get_all_users_view
        )
        app.route(
            "/user/", name="get_user_by_id_view",
            method=["GET"], callback=self.get_user_by_id_view
        )
        # ... route the rest of the views here.

    def index_view(self):
         return "Hello!"

    def get_all_things_view(self):
         things = load_all_things(self.db_connection)
         formatted_things = [format_thing(t) for t in things]
         return "Things: " + "\n".join(formatted_things)
   
    def get_all_users_view(self):
         users = load_all_users(self.db_connection)
         formatted_users = [format_user(u) for u in users]
         return "Users: " + "\n".join(formatted_users)
    
    def get_user_by_id_view(self, user_id):
          user = load_user_by_id(self.db_connection, user_id)
          if not user:
              return "No such user: {0}".format(user_id)
          else:
              return format_user(user)

    # ... more view methods here

Adding further views are easier, and there’s a clear precedent how to group like views and add extra dependencies (again like our database connection). Similar to before, our main.py file loads the view class, and applies it in the application creation function:

from bottle import Bottle
from views import MyApplicationViews

def create_db_connection(db_connection_info):
    # Do your db specific stuff to instantiate a connection
    # instead of returning None
    db_connection = None
    return db_connection

def create_my_app(db_connection_info):
    db_connection = create_db_connection(db_connection_info)
 
    app = Bottle()
    views = MyApplicationViews(db_connection)
    views.attach_to_app(app)
    return app

if __name__ == "__main__":
     # In a real application load your specific
     # DB connection data, probably from an argument parser or
     # similar.
     db_connection_info = {
        "host": "127.0.0.1", 
        "user": "foo", 
        "password": "bar", 
        "database_name": "baz"
     }
     my_app = create_my_app(db_connection_info)
     my_app.run(host="127.0.0.1", port=8080)

The class based definition makes reusing and simplifying code much easier, and provides a clear and easy way to split types of views (i.e. all routes for a REST API go into a single “views” class, routes for statically generated pages go into a different “views” class, etc.). I find that this leads to cleaner and easier to maintain code, which is especially important in microframeworks, which have an extreme amount of freedom in design (and thus the easy ability to make a mess).

Questions or comments? Think there might be a better way to do this or improve upon this? Post below!