Difference between revisions of "SBN MPC Wiki"

From The SBN Wiki
Jump to navigation Jump to search
m
(Add "best practices" for SBN replication clients. Terms of Service (ToS) will refer to that paragraph.)
Line 73: Line 73:
 
   ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION;
 
   ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION;
  
<h2>"How to drop a database for complete re-start or version upgrade:</h2>
+
<h2>How to drop a database for complete re-start or version upgrade:</h2>
 
* If you have a publication and distribute data to somebody - drop the publication:
 
* If you have a publication and distribute data to somebody - drop the publication:
 
   [sudo -u postgres] psql <dbname>
 
   [sudo -u postgres] psql <dbname>
Line 92: Line 92:
 
   sudo yum remove postgresql11
 
   sudo yum remove postgresql11
 
* To start a new version of database go to "Replica side (from the scratch)"
 
* To start a new version of database go to "Replica side (from the scratch)"
 +
 +
<h2>Best practices for SBN Postgres replication clients:</h2>
 +
* Postgres configuration tune-up.
 +
  As a minimum: https://pgtune.leopard.in.ua/#/
 +
  More detailed: https://pgconfigurator.cybertec-postgresql.com/
 +
* Hardware.
 +
  <b>SSD storage for data-dir.</b>
 +
  Based on experience, it's the most critical for better performance of the database and replication.
 +
  Even more important, than network bandwidth.
 +
  <b>RAM memory.</b>
 +
  Better to have more bad / slow memory than less good / fast memory.
 +
  <b>CPU.</b>
 +
  Multi-core and fast processor(s) could make a small advantage for database and replication performance.
 +
* 24/7 service.
 +
  Please avoid any hibernating devices like desktop or especially laptop.
 +
* No new subscription / replication slot without SBN agreement.
 +
  If you have any issues with existing subscription / replication slot, please, try to resolve them first.
 +
  In case of emergency, drop the existing subscription / replication slot first, and then create a new subscriiption (with the same or different name).
 +
* Index(es).
 +
  Current database has a limited set of indexes mostly created by MPC for their own processes.
 +
  Based on your specific needs you could create new index(es) to improve the query performance. Please, pause the subscription, create new index, continue the subscription.

Revision as of 23:09, 24 August 2023

Postgres database replication.


Note that the public database schema is a β release, and may be subject to change over the development period. Tables may be added, removed, or modified by the MPC.

If you decided to replicate a Postgres mpcbeta copy of database from SBN, please, consider "Master" side paragraph FOR YOUR INFORMATION only. Start from the "Replica" side (from the scratch).

“Master” side:

  1. Make sure port 5432 is open.
  2. As postgres user check postgresql.conf:
  listen_addresses should be listening to the replica address.
  wal_level = logical
  1. In psql setup an account and give it at least read access to the database and tables to be replicated:
  CREATE ROLE <username> WITH REPLICATION LOGIN PASSWORD ‘<password>’;
  GRANT CREATE ON DATABASE <dbname> TO <username>;
  GRANT SELECT ON ALL TABLES IN SCHEMA public TO <username>;
  1. As postgres user edit pg_hba.conf:
  In the hosts section add a line like:
  host <dbname> <username> <user_IP>/32 md5
  1. Restart the postgres server.
  2. In psql in the <dbname> create a publication:
  [sudo -u postgres] psql <dbname>
  CREATE PUBLICATION <pubname> FOR ALL TABLES;
  or
  CREATE PUBLICATION <pubname> FOR TABLES table1, table2, table3,...;
  1. Dump the schema of all tables in the publication and send them to the replica machine:
  pg_dump -d <dbname> -t <table1> -s > <table1>_schema.sql
  pg_dump -d <dbname> -t <table2> -s > <table2>_schema.sql
  pg_dump -d <dbname> -t <table3> -s > <table3>_schema.sql
     
  (can be done for all tables at once by removing the -t option, but above method makes it easier to troubleshoot any issues).
  pg_dump -d <dbname> > <dbname>_schemas.sql

Items 1,2,6 need to be done 1 time.
Items 3,4,5 need to be done with every new replication
Item 7 could be done once, but with every new replication the <username> should be updated in every *_schema.sql file:

  GRANT SELECT ON TABLE public.neocp_events TO <username>;

“Replica” side (from the scratch):

  1. Install PostgresSQL v14.x
  2. Provide IP-address of the server to SBN (mamoutkine@astro.umd.edu)
  3. Wait for table-structures*.sql files and shell script from SBN
  4. Download tables_structure*.sql files into local server disk with reading permission for all.
  5. Create an empty database (mpcbeta):
  sudo -u postgres psql
  CREATE DATABASE <dbname>
  1. Add empty tables using schemas from the "master" (Run those commands from the same directory where *.sql files):
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  or
  [sudo -u postgres] psql <dbname> < <dbname>_schemas.sql
  1. Create a subscription to the master’s publication:
  sudo -u postgres psql <dbname>
  CREATE SUBSCRIPTION <subname> CONNECTION “host=<master_IP_address> port=5432 user=<username> password=<password> dbname=<dbname>” PUBLICATION <pubname>;

<pubname> in “replica” and “master” sides must match, but the <subname> must not already exist. Multiple subscribers can all connect to the same publication <pubname>.

https://www.digitalocean.com/community/tutorials/how-to-set-up-logical-replication-with-postgresql-10-on-ubuntu-18-04

  1. How to check a replication:
  sudo -u postgres psql mpcbeta
  SELECT * FROM neocp_obs_archive;

Another check:

  sudo -u postgres psql mpcbeta
  SELECT * FROM pg_stat_replication;

Another check: View as a sudo latest log-file: [psql_path]/11/data/log/postgresql-[Wed].log

“Replica” side (with existing logical replication from SBN) - how to add new table(s):

  1. Download all structure_only.sql file into local disk with reading permission to all
  2. Add empty tables using schemas from “master”:
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  1. Modify a subscription to the master’s publication:
  sudo -u postgres psql <dbname>
  ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION;

How to drop a database for complete re-start or version upgrade:

  • If you have a publication and distribute data to somebody - drop the publication:
  [sudo -u postgres] psql <dbname>
  DROP PUBLICATION <pubname>;
  • If you subscribed to another database and have an active connection - drop the subscription:
  [sudo -u postgres psql <dbname>]
  DROP SUBSCRIPTION IF EXISTS <subname>;
  • If you subscribed to another database but connection is broken:
  ALTER SUBSCRIPTION <subname> DISABLE;
  ALTER SUBSCRIPTION <subname> SET (slot_name=NONE);
  DROP	SUBSCRIPTION <subname>;
  • Delete database:
  [sudo -u postgres psql]
  DROP DATABASE <dbname>;
  • If you would like to remove the old PosgreSQL version:
  sudo systemctl stop postgresql-11
  sudo systemctl disable postgresql-11
  sudo yum remove postgresql11
  • To start a new version of database go to "Replica side (from the scratch)"

Best practices for SBN Postgres replication clients:

  • Postgres configuration tune-up.
  As a minimum: https://pgtune.leopard.in.ua/#/
  More detailed: https://pgconfigurator.cybertec-postgresql.com/
  • Hardware.
  SSD storage for data-dir.
  Based on experience, it's the most critical for better performance of the database and replication.
  Even more important, than network bandwidth.
  RAM memory.
  Better to have more bad / slow memory than less good / fast memory.
  CPU.
  Multi-core and fast processor(s) could make a small advantage for database and replication performance.
  • 24/7 service.
  Please avoid any hibernating devices like desktop or especially laptop.
  • No new subscription / replication slot without SBN agreement.
  If you have any issues with existing subscription / replication slot, please, try to resolve them first.
  In case of emergency, drop the existing subscription / replication slot first, and then create a new subscriiption (with the same or different name).
  • Index(es).
  Current database has a limited set of indexes mostly created by MPC for their own processes.
  Based on your specific needs you could create new index(es) to improve the query performance. Please, pause the subscription, create new index, continue the subscription.