We have a jira instance which uses the postgres database to store its information. The server had roughly 50 Gb of disk, so it ran out of space before we noticed. We extended the partition to 100 Gb but apparently it was too late.
Jira started complaining about it with an 500 internal server error (Estado HTTP 500 – Could not determine database type. (Conexión rechazada. Verifique que el nombre del Host y el puerto sean correctos y que postmaster este aceptando conexiones TCP/IP.)). Viewing the logs we found Caused by: java.net.ConnectException: Connection refused
We checked our database and database settings for external connections and couldn’t connect.
Then we tried to start the postgres service
sudo service postgresql start
2018-12-04 09:01:59.268 CST [21281] LOG: database system was interrupted while in recovery at 2018-12-04 08:59:20 CST 2018-12-04 09:01:59.268 CST [21281] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2018-12-04 09:01:59.631 CST [21281] LOG: database system was not properly shut down; automatic recovery in progress 2018-12-04 09:01:59.636 CST [21281] LOG: redo starts at 7/CC020C68 2018-12-04 09:01:59.639 CST [21281] LOG: record with zero length at 7/CC07E998 2018-12-04 09:01:59.639 CST [21281] LOG: redo done at 7/CC07E968 2018-12-04 09:01:59.639 CST [21281] LOG: last completed transaction was at log time 2018-12-03 10:29:49.99078-06 2018-12-04 09:01:59.641 CST [21281] FATAL: btree level 1 not found in index "19881"
VITALLY IMPORTANT FIRST RESPONSE
I did a backup of the postgres data files
ssh admin@jira.server
sudo tar cvf postgres.bkp.tar /var/lib/postgresql/
Copied it to my local machine
rsync -av admin@jira.server:postgres.bkp.tar .
then installed the same version (series) that is on the server into my local machine.
Since I have ubuntu 18.04 I had to add the postgres repository to install postgres version 9.3.
echo «deb http://apt.postgresql.org/pub/repos/apt/ bionic-pgdg main» | sudo tee /etc/apt/sources.list.d/pgdg.list
wget –quiet -O – https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add –
sudo apt update
sudo apt install postgresql-9.3 postgresql-contrib-9.3
Then I tried to run postgres locally with the backup data
cd /tmp
tar xvf postgres.bkp.tar
sudo su postgres
/usr/lib/postgresql/9.3/bin/postgres -D /tmp/var/lib/postgresql/9.3/main/
it complained about not finding the configuration file so I just copied it from the default dirs
cp /etc/postgresql/10/main/postgresql.conf /tmp/var/lib/postgresql/9.3/main/
cp -R /etc/postgresql/10/main/conf.d/ /tmp/postgres/var/lib/postgresql/9.3/main/
edit the postgresql.conf file and change the data dir appropiately
data_directory = ‘/tmp/var/lib/postgresql/9.3/main’
then I tried again (but this time in single mode) and it reproduced exactly the same error as on server. Great!
postgres@elite-tsj:/tmp$ /usr/lib/postgresql/9.3/bin/postgres --single -P -d 1 -D /tmp/var/lib/postgresql/9.3/main/ 2018-12-04 09:01:59.268 CST [21281] LOG: database system was interrupted while in recovery at 2018-12-04 08:59:20 CST 2018-12-04 09:01:59.268 CST [21281] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2018-12-04 09:01:59.631 CST [21281] DEBUG: checkpoint record is at 7/CC025C50 2018-12-04 09:01:59.631 CST [21281] DEBUG: redo record is at 7/CC020C68; shutdown FALSE 2018-12-04 09:01:59.631 CST [21281] DEBUG: next transaction ID: 0/9547804; next OID: 227497 2018-12-04 09:01:59.631 CST [21281] DEBUG: next MultiXactId: 3; next MultiXactOffset: 5 2018-12-04 09:01:59.631 CST [21281] DEBUG: oldest unfrozen transaction ID: 676, in database 1 2018-12-04 09:01:59.631 CST [21281] DEBUG: oldest MultiXactId: 1, in database 1 2018-12-04 09:01:59.631 CST [21281] DEBUG: transaction ID wrap limit is 2147484323, limited by database with OID 1 2018-12-04 09:01:59.631 CST [21281] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2018-12-04 09:01:59.631 CST [21281] LOG: database system was not properly shut down; automatic recovery in progress 2018-12-04 09:01:59.634 CST [21281] DEBUG: resetting unlogged relations: cleanup 1 init 0 2018-12-04 09:01:59.636 CST [21281] LOG: redo starts at 7/CC020C68 2018-12-04 09:01:59.639 CST [21281] LOG: record with zero length at 7/CC07E998 2018-12-04 09:01:59.639 CST [21281] LOG: redo done at 7/CC07E968 2018-12-04 09:01:59.639 CST [21281] LOG: last completed transaction was at log time 2018-12-03 10:29:49.99078-06 2018-12-04 09:01:59.639 CST [21281] DEBUG: resetting unlogged relations: cleanup 0 init 1 2018-12-04 09:01:59.641 CST [21281] FATAL: btree level 1 not found in index "19881"
since this data is a copy i just executed pg_resetxlog as shown here.
Be aware that use of the pg_resetxlog utility should be done as an absolute last resort, and there are still some things you should try first..
/usr/lib/postgresql/9.3/bin/pg_resetxlog /tmp/var/lib/postgresql/9.3/main/
it complained I had to force it, there you go
/usr/lib/postgresql/9.3/bin/pg_resetxlog -f /tmp/var/lib/postgresql/9.3/main/
done! Now let’s see if it’s working again
sudo su postgres
/usr/lib/postgresql/9.3/bin/postgres -D /tmp/var/lib/postgresql/9.3/main/
boom! it’s working.
Now it’s time to do a backup with sql dump (pg_dump) or better yet a full sql dump (pg_dumpall).
In another terminal (or tab)
sudo su postgres
pg_dumpall -p 5433 > /tmp/postgres-full.dmp
in my case I had to specify the port (-p 5433).
Now that I have a full backup I can try the same on production.
/usr/lib/postgresql/9.3/bin/pg_resetxlog -f /var/lib/postgresql/9.3/main/
sudo service postgresql start
thank the gods! (to recall the option pointed here)