I want to share a scenario I ran into and a quick hack to solve it: I administer a system on the internet, which hosts some private git repositories for friends, using Gitea, with NGINX being used as outward facing web server (between Gitea and the internet). For a while this worked fine, as there were no other pages or applications hosted on this server, and I could rely strictly on Gitea’s local authentication and user management.
I ran into a problem recently, however, in that Gitea doesn’t seem to have many real user-authoring or wiki-like features built in, and there was a need to add a wiki to enhance collaboration on a shared project. I really like the lightweight Oddmuse wiki software, but by default it doesn’t ship with authentication built in, and I really wanted a single unified system of authentication for this server.
I decided I was OK utilizing HTTP basic auth (which is pretty secure so long as your connections are all HTTPS). A very common way to make HTTP basic auth work is utilizing “htpasswd” files (I believe these originated with Apache HTTPD, but have been long supported in NGINX and Lighttpd, among other webservers). This works OK sometimes, but Gitea stores authentication data differently and with different hash formats (in its own database), and in general I’ve found that keeping these updated and synchronized is hard. If someone wanted to reset their password, you need to manually go update the htpasswd file, or have invent some other way to handle this (usually ugly). You can read more about htpasswd style authentication for NGINX here.
Another typical choice for adding authentication to web servers is to utilize LDAP. While this is a very complete and robust solution, I have found LDAP to be an absolute nightmare to setup and administer (or even understand), and it feels relatively heavy-weight for a scenario such as this. For a larger group of people or many servers, this is likely appropriate, but not what I want to use here, as I value my time enough to not go figure out all of the complexity of LDAP again.
At this point, I wanted to see how Gitea stores its users and authentication data. I had initially thought to write an NGINX extension in C if I could figure out how Gitea manages users and authentication, and use this for authentication. I utilize a SQLite3 database with Gitea, as the system is relatively low volume. Enumerating the tables Gitea has in its database (typically stored at /var/lib/gitea/data/gitea.db if you’re using SQLite3) using the handy sqlite3 command line tool yields the following:
.sqlite> .tables access oauth2_grant access_token oauth2_session action org_user attachment protected_branch collaboration public_key comment pull_request commit_status reaction deleted_branch release deploy_key repo_indexer_status email_address repo_redirect external_login_user repo_topic follow repo_unit gpg_key repository gpg_key_import review hook_task star issue stopwatch issue_assignees task issue_dependency team issue_label team_repo issue_user team_unit issue_watch team_user label topic lfs_lock tracked_time lfs_meta_object two_factor login_source u2f_registration milestone upload mirror user notice user_open_id notification version oauth2_application watch oauth2_authorization_code webhook
So, there are many tables here, but it turns out (for local authentication) the user table has pretty much what we need. Here is the schema for the user table:
sqlite> .schema user CREATE TABLE `user` ( `id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, `lower_name` TEXT NOT NULL, `name` TEXT NOT NULL, `full_name` TEXT NULL, `email` TEXT NOT NULL, `keep_email_private` INTEGER NULL, `email_notifications_preference` TEXT DEFAULT 'enabled' NOT NULL, `passwd` TEXT NOT NULL, `passwd_hash_algo` TEXT DEFAULT 'pbkdf2' NOT NULL, `must_change_password` INTEGER DEFAULT 0 NOT NULL, `login_type` INTEGER NULL, `login_source` INTEGER DEFAULT 0 NOT NULL, `login_name` TEXT NULL, `type` INTEGER NULL, `location` TEXT NULL, `website` TEXT NULL, `rands` TEXT NULL, `salt` TEXT NULL, `language` TEXT NULL, `description` TEXT NULL, `created_unix` INTEGER NULL, `updated_unix` INTEGER NULL, `last_login_unix` INTEGER NULL, `last_repo_visibility` INTEGER NULL, `max_repo_creation` INTEGER DEFAULT -1 NOT NULL, `is_active` INTEGER NULL, `is_admin` INTEGER NULL, `allow_git_hook` INTEGER NULL, `allow_import_local` INTEGER NULL, `allow_create_organization` INTEGER DEFAULT 1 NULL, `prohibit_login` INTEGER DEFAULT 0 NOT NULL, `avatar` TEXT NOT NULL, `avatar_email` TEXT NOT NULL, `use_custom_avatar` INTEGER NULL, `num_followers` INTEGER NULL, `num_following` INTEGER DEFAULT 0 NOT NULL, `num_stars` INTEGER NULL, `num_repos` INTEGER NULL, `num_teams` INTEGER NULL, `num_members` INTEGER NULL, `visibility` INTEGER DEFAULT 0 NOT NULL, `repo_admin_change_team_access` INTEGER DEFAULT 0 NOT NULL, `diff_view_style` TEXT DEFAULT '' NOT NULL, `theme` TEXT DEFAULT '' NOT NULL ); CREATE UNIQUE INDEX `UQE_user_name` ON `user` (`name`); CREATE UNIQUE INDEX `UQE_user_lower_name` ON `user` (`lower_name`); CREATE INDEX `IDX_user_created_unix` ON `user` (`created_unix`); CREATE INDEX `IDX_user_updated_unix` ON `user` (`updated_unix`); CREATE INDEX `IDX_user_last_login_unix` ON `user` (`last_login_unix`); CREATE INDEX `IDX_user_is_active` ON `user` (`is_active`);
Examining the schema, we see that the information we probably need to authenticate users is likely stored entirely in this table. Great! We probably want to pay attention to the name, passwd (probably the hash value), passwd_hash_algo, type, salt, is_active, and prohibit_login columns. A quick dump of the users yields user records such as:
sqlite> select * from user; id = 1 lower_name = foobar name = Foobar full_name = Foo Bar email = foo@bar.com keep_email_private = 0 email_notifications_preference = enabled passwd = 056577a98e56c10f7084f2916c163785e409d3fb9f8f5251ec747f24d639f6ae73750f29da068a090ef24c4bfc115deb178c passwd_hash_algo = pbkdf2 must_change_password = 0 login_type = 0 login_source = 0 login_name = type = 0 location = website = rands = CGkQd8yAmC salt = oZM0lIBZQz language = en-US description = created_unix = 1574639131 updated_unix = 1574639131 last_login_unix = 1574639131 last_repo_visibility = 0 max_repo_creation = -1 is_active = 1 is_admin = 0 allow_git_hook = 0 allow_import_local = 0 allow_create_organization = 1 prohibit_login = 0 avatar = c8af9bdacc70eceaade55fe2b572daa3 avatar_email = foo@bar.com use_custom_avatar = 0 num_followers = 999 num_following = 999 num_stars = 999 num_repos = 999 num_teams = 0 num_members = 0 visibility = 0 repo_admin_change_team_access = 0 diff_view_style = theme = gitea
A couple of things to notice here, is that this (fake) user and all of the users in the gitea database by default use the pbkdf2 hashing algorithm, which is fortunately relatively strong and pretty common (Python’s built-in hashlib comes with support for pbkdf2 out of the box). If you count the number of hex characters in the password string, you’ll notice it’s 100 characters long; 100 hex characters is equivalent to a length 50 byte string, so while the user table doesn’t indicate the hash value length explicitly, we can assume it’s probably generating 50 byte hashes. We see the salt in the record as well; the only questions now are how many rounds the hash algorithm does, and what is the base hashing algorithm used by pbkdf2 (this is often SHA-1 or SHA-256).
To answer the question of how many rounds, we may fortunately go examine Gitea’s source code. In the models directory of the git repository, under the user.go file, we see the hashPassword function on line 464:
func hashPassword(passwd, salt, algo string) string { var tempPasswd []byte switch algo { case algoBcrypt: tempPasswd, _ = bcrypt.GenerateFromPassword([]byte(passwd), bcrypt.DefaultCost) return string(tempPasswd) case algoScrypt: tempPasswd, _ = scrypt.Key([]byte(passwd), []byte(salt), 65536, 16, 2, 50) case algoArgon2: tempPasswd = argon2.IDKey([]byte(passwd), []byte(salt), 2, 65536, 8, 50) case algoPbkdf2: fallthrough default: tempPasswd = pbkdf2.Key([]byte(passwd), []byte(salt), 10000, 50, sha256.New) } return fmt.Sprintf("%x", tempPasswd) }
We first see that the default password hashing algorithm is pbkdf2, which we expect. While the parameters on the pbkdf2.Key function aren’t totally explicitly enumerated, we can quickly guess (or look at the Go documenation for this function) that it always does 10000 rounds (since we strongly believe the pbkdf2 value length is always 50 bytes), and that it utilizes SHA-256 as the base hashing algorithm. This is excellent, as all of this is relatively straightforward to implement elsewhere.
Now, back to the initial problem of adding HTTP basic auth using the Gitea database in NGINX. We could write an NGINX module in C, but writing C for myself if often slow going, error prone, and more challenging than writing Python. Fortunately there is another way NGINX allows administrators to add authentication to their webservers: subrequest authentication. In a nutshell, to perform authentication, NGINX sends all or part of the incoming request to another web server or suburl, and the status code result of this request (either 2xx for valid authentication or 401/403 for bad authentication), is what NGINX uses to ascertain if the given authentication data was good or bad.
This means, if we can implement a very small web service on our host, which can read our HTTP basic auth data from incoming requests, search the gitea database for a matching user, check the incoming password against the stored hash, and return the correct status code, we’re probably golden. For challenges like this, I really love utilizing the Bottle web framework. One really strong reason to prefer Bottle for this, is that it is a single Python file, supports both Python 2 and 3, and has no outside requirements. This means so long as everything else comes from the Python standard library, we may just “vendorize” our copy of bottle, and forgo the need to either add/remove/alter global Python packages or utilize a Python virtualenv in our deployment.
In my Python code, the first thing I built out was the code to hash (and check) passwords the same way Gitea does. We know the passwords are all pbkdf2, use SHA-256, have a salt and use 10000 rounds of hashing, have a key length of 50 bytes, and are stored as hexadecimal values. Looking at the built-in hashlib module, we see the pbkdf2_hmac function, which does pretty much what we need; we can combine this with the “hexlify” function from the binascii module, as pbkdf2_hmac yields bytes instead of hexdigits. The code to generate hashes is thus:
import binascii import hexlify def do_gitea_pbkdf2(candidate_password, salt): hashed = hashlib.pbkdf2_hmac( 'sha256', bytes(candidate_password, encoding='utf-8'), bytes(salt, encoding='utf-8'), 10000, 50 ) return binascii.hexlify(hashed).decode('ascii')
All that is needed further to validate the hash then is to compare it to the value in the database itself.
The next thing to do is to figure out how to retrieve the rows out of the database itself. We can use the pysqlite module to open and search the database. Since we want to find matching users in the database who are permitted to log in, we can use the following select statement:
SELECT * FROM user WHERE (lower_name = :un OR name = :un OR email = :un) AND is_active = 1 AND type = 0 AND prohibit_login = 0
where the values starting with : will be used for parameter substitution later. We can use this statement with some Python glue code to perform that password checking, using our earlier do_gitea_pbkdf2 function:
from sqlite3 import dbapi2 as sqlite # dict_factory used to return dictionaries # instead of tuples from SQLite queries to # ease getting specific column values later. def dict_factory(cursor, row): d = dict() for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d def create_connection(database_url): # create a new SQLite3 connection # with the dict row factory instead of the default factory. connection = sqlite.connect(database_url) connection.row_factory = dict_factory return connection def check_pass(connection, username, passwd): cursor = connection.cursor() try: cursor.execute( "SELECT * FROM user WHERE (lower_name = :un OR name = :un OR email = :un)" " AND is_active = 1 AND type = 0 AND prohibit_login = 0", {"un": username.strip()} ) result = cursor.fetchone() if result: if result['passwd_hash_algo'] == "pbkdf2": # If gitea used pbkdf2 to hash the password... if do_gitea_pbkdf2(passwd, row['salt'], debug=debug) == \ row['passwd']: # The hash matches the incoming user password return True else: # The hash did not match the incoming user password return False else: # Don't know how to hash this, just default to # not allowing the user to log in. # This could happen if bcrypt, etc. were used to # hash instead, but could be handled with more # code. return False else: # No such user in the database. return False finally: cursor.close()
We’re almost there! The last thing to do is to bring bottle in and make use of it. An easy (but perhaps not the most robust or best) way to do this is to “vendorize” it. You may simply create a “vendor” directory in your project, with an empty __init__.py file, and place the bottle.py file into this directory. You can now load bottle by using the following:
from vendor import bottle
This is sometimes not the nicest or best way to bring in packages (if you can, it’s usually better to use proper package management, probably with pip for python), but this can be an easy way to bring something in with minimal fuss.
Now that we have bottle, we can create a simple app that simply authenticates against the database and returns an appropriate status:
from vendor import bottle def build_app(database_url): app = bottle.Bottle() connection = create_connection(database_url) # form a "partially applied" function # as bottle expects a function that takes # only as username and password, but we also need # to feed in the connection as well. auth_partial = lambda un, pw: check_pass(connection, un, pw) @app.route("/auth", name="auth_view") @bottle.auth_basic(auth_partial) def auth_view(): return bottle.HTTPResponse( status=200, body="success" ) return app
Excellent, that’s pretty much all we need. The full code for a minimal working application implementing this may be found here: https://github.com/cope-systems/bottle-gitea-auth-example.
All that is needed now is to clone the repository, set up an appropriate systemd (or similar) service file, and add the authentication to your NGINX setup. I chose to run my service on port 9091 (bound only to the local interface, 127.0.0.1). Once running all that’s needed to protect your NGINX site is the following:
server { ... auth_request /auth; location /auth { proxy_pass http://localhost:9091/auth; proxy_pass_request_body off; proxy_set_header Content-Length ""; proxy_set_header X-Original-URI $request_uri; } }
This should prompt all pages now to include HTTP basic auth, which will correspond to the user logins for your Gitea instance. This provides at least a minimal amount of security to anything else being servered on this web server, with relatively low hassle. This example could also be extended to work with many other forms of authentication (other databases/applications), and in general might provide a good alternative to LDAP and htpasswd files for many applications.
Questions or Comments? Post below.