SBN MPC Wiki

From The SBN Wiki
Jump to navigation Jump to search

Postgres database replication.


Note that the public database schema is a β release, and may be subject to change over the development period. Tables may be added, removed, or modified by the MPC.

If you decided to replicate a Postgres mpc_sbn copy of database from SBN, please, consider "Master" side paragraph FOR YOUR INFORMATION only. Start from the "Replica" side (from the scratch).

“Master” side:

  1. Make sure port 5432 is open.
  2. As postgres user check postgresql.conf:
  listen_addresses should be listening to the replica address.
  wal_level = logical
  1. In psql setup an account and give it at least read access to the database and tables to be replicated:
  CREATE ROLE <username> WITH REPLICATION LOGIN PASSWORD ‘<password>’;
  GRANT CREATE ON DATABASE <dbname> TO <username>;
  GRANT SELECT ON ALL TABLES IN SCHEMA public TO <username>;
  1. As postgres user edit pg_hba.conf:
  In the hosts section add a line like:
  host <dbname> <username> <user_IP>/32 md5
  1. Restart the postgres server.
  2. In psql in the <dbname> create a publication:
  [sudo -u postgres] psql <dbname>
  CREATE PUBLICATION <pubname> FOR ALL TABLES;
  or
  CREATE PUBLICATION <pubname> FOR TABLES table1, table2, table3,...;
  1. Dump the schema of all tables in the publication and send them to the replica machine:
  pg_dump -d <dbname> -t <table1> -s > <table1>_schema.sql
  pg_dump -d <dbname> -t <table2> -s > <table2>_schema.sql
  pg_dump -d <dbname> -t <table3> -s > <table3>_schema.sql
     
  (can be done for all tables at once by removing the -t option, but above method makes it easier to troubleshoot any issues).
  pg_dump -d <dbname> > <dbname>_schemas.sql

Items 1,2,6 need to be done 1 time.
Items 3,4,5 need to be done with every new replication
Item 7 could be done once, but with every new replication the <username> should be updated in every *_schema.sql file:

  GRANT SELECT ON TABLE public.neocp_events TO <username>;

“Replica” side (from the scratch):

  1. Install PostgresSQL v14.x
  2. Provide IP-address of the server to SBN (mamoutkine@astro.umd.edu)
  3. Wait for table-structures*.sql files and shell script from SBN
  4. Download tables_structure*.sql files into local server disk with reading permission for all.
  5. Create an empty database (mpc_sbn):
  sudo -u postgres psql
  CREATE DATABASE <dbname>
  1. Add empty tables using schemas from the "master" (Run those commands from the same directory where *.sql files):
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  or
  [sudo -u postgres] psql <dbname> < <dbname>_schemas.sql
  1. Create a subscription to the master’s publication:
  sudo -u postgres psql <dbname>
  CREATE SUBSCRIPTION <subname> CONNECTION “host=<master_IP_address> port=5432 user=<username> password=<password> dbname=<dbname>” PUBLICATION <pubname>;

<pubname> in “replica” and “master” sides must match, but the <subname> must not already exist. Multiple subscribers can all connect to the same publication <pubname>.

https://www.digitalocean.com/community/tutorials/how-to-set-up-logical-replication-with-postgresql-10-on-ubuntu-18-04

  1. How to check a replication:
  sudo -u postgres psql mpc_sbn
  SELECT * FROM neocp_obs_archive;

Another check:

  sudo -u postgres psql mpc_sbn
  SELECT * FROM pg_stat_replication;

Another check: View as a sudo latest log-file: [psql_path]/11/data/log/postgresql-[Wed].log

“Replica” side (with existing logical replication from SBN) - how to add new table(s):

  1. Download all structure_only.sql file into local disk with reading permission to all
  2. Add empty tables using schemas from “master”:
  [sudo -u postgres] psql -d <dbname> -a < <table1>_schema.sql
  1. Modify a subscription to the master’s publication:
  sudo -u postgres psql <dbname>
  ALTER SUBSCRIPTION <subscription_name> REFRESH PUBLICATION;

How to drop a database for complete re-start or version upgrade:

  • If you have a publication and distribute data to somebody - drop the publication:
  [sudo -u postgres] psql <dbname>
  DROP PUBLICATION <pubname>;
  • If you subscribed to another database and have an active connection - drop the subscription:
  [sudo -u postgres psql <dbname>]
  DROP SUBSCRIPTION IF EXISTS <subname>;
  • If you subscribed to another database but connection is broken:
  ALTER SUBSCRIPTION <subname> DISABLE;
  ALTER SUBSCRIPTION <subname> SET (slot_name=NONE);
  DROP	SUBSCRIPTION <subname>;
  • Delete database:
  [sudo -u postgres psql]
  DROP DATABASE <dbname>;
  • If you would like to remove the old PosgreSQL version:
  sudo systemctl stop postgresql-11
  sudo systemctl disable postgresql-11
  sudo yum remove postgresql11
  • To start a new version of database go to "Replica side (from the scratch)"

Best practices for SBN Postgres replication clients:

  • Postgres configuration tune-up.
  As a minimum: https://pgtune.leopard.in.ua/#/
  More detailed: https://pgconfigurator.cybertec-postgresql.com/
  • Hardware.
  SSD storage for data-dir.
  Based on experience, it's the most critical for better performance of the database and replication.
  Even more important, than network bandwidth.
  RAM memory.
  Better to have more bad / slow memory than less good / fast memory.
  CPU.
  Multi-core and fast processor(s) could make a small advantage for database and replication performance.
  Example of hardware specifications for the server to replicate the MPC Postgres database:
  CPU - quad core or more
  RAM - 32-48 GB
  SSD - 4 TB
  Network - 1-10 Gbps would be a good compliment.
  • 24/7 service.
  Please avoid any hibernating devices like desktop or especially laptop.
  • No new subscription / replication slot without SBN agreement.
  If you have any issues with existing subscription / replication slot, please, try to resolve them first.
  In case of emergency, drop the existing subscription first, drop data from all tables, and then create a new subscription (with the same or different name).

Index(es) and Table(s).

  • In addition to the table structures and data, the logical replication of the current database includes index(es) created by MPC to support their processes.
  • You (the Subscriber) could create custom index(es) on your local copy of the database to improve the query performance specific to your processes. Please, pause the subscription, create new index, continue the subscription:
  ALTER SUBSCRIPTION <your_sub> DISABLE;
  CREATE INDEX idx_table_column ON table(column);
  ALTER SUBSCRIPTION <your_sub> ENABLE;
  • You also could add custom table(s) to your local copy of the database. Similar, pause the subscription, create new table, continue subscription:
  ALTER SUBSCRIPTION <your_sub> DISABLE;
  CREATE TABLE distributors (
    did     integer PRIMARY KEY,
    name    varchar(40)
  );
  ALTER SUBSCRIPTION <your_sub> ENABLE;

MPC Distributed Postgres Database Terms Of Service

Version 1.0, 9/28/2023

These terms of service ("Terms") govern your replication, access, and use of the data distributed from PostgreSQL database ("Database", or alternatively, the MPCDB), a public copy of the Minor Planet Center’s (MPC’s) live database, hosted from the NASA Planetary Data System’s Small Bodies Node (“SBN”), located at the University of Maryland. This Database contains all publicly distributed data collected and produced by the MPC, and has live copies of the central processing tables. The database design base is Postgres and uses the Postgres built-in subscription capability.

Descriptions of the database content and structure are provided elsewhere (see the schema posted on the MPC’s website: https://data.minorplanetcenter.net/postgres-schema/schema.html). By replicating, accessing, or using the Database, you agree to be bound by these Terms. If you disagree with these Terms, you may not replicate, access, or use the Database.

  1. Description of Service

We offer a service that allows entities to replicate, access, and use data from our Database for their internal purposes. SBN is distributing the Database from the Minor Planet Center (“MPC”) as a courtesy.

  1. License

We grant you a non-exclusive, non-transferable, non-sublicensable license to replicate, access, and use our database's data for your internal purposes.

  1. Intellectual Property

All intellectual property rights in the Database, including but not limited to copyrights, trademarks, patents, and trade secrets, are owned by us or our licensors. To prevent confusion, any modifications or derived data products must be explicitly and clearly marked as such if they are redistributed.

  1. Restrictions

You agree not to use the Database for any illegal purpose. You may not replicate, access, or use the Database in violation of these Terms or any applicable law or regulation. You may not use the Database to harass, defame, or threaten any person or entity.

  1. Limitation of Liability

SBN and MPC are not responsible for any damages, losses, or liabilities arising from your use of the Database, including but not limited to any errors, omissions, interruptions, or delays. We do not guarantee the accuracy or completeness of any data in the Database, and we are not liable for any reliance on such data. Users are responsible for their resource management.

  1. Termination

This service can be negatively impacted by users at an individual endpoint. We may therefore find it necessary to terminate your access to the Database at any time for any reason. Some occurrences that may result in such termination are mentioned in Terms 11-13 below. This software should not be run on a machine that is not intended to be online 24 hours per day. Alternative means of acquiring specific information within the database will be provided; please contact SBN (see Term 10 below) for these services.

  1. Governing Law

These Terms are governed by and construed in accordance with the laws of the jurisdiction in which we are located. Any dispute arising from or relating to these Terms shall be resolved through binding arbitration in accordance with the rules of the American Arbitration Association.

  1. Changes to Terms

We reserve the right to modify or update these Terms at any time. If we make material changes to these Terms, we will notify you by email or by posting a notice on our website (https://sbnmpc.astro.umd.edu/MPC_database/statusDB.shtml). Your continued use of the Database following any such changes constitutes your acceptance of the revised Terms.

  1. User liability

The User is responsible for responding reasonably to SBN requests related to the Database distribution service, safety, security, or integrity.

  1. Contact Information

If you have any questions about these Terms or the Database, please contact us using the MPC helpdesk [1] under the category of “Database Replication”.

  1. Operating system and Postgres version requirements

The live copy of the relational system is a Postgres database. To ensure integrity and longevity of the distribution system, SBN requires that the system to which the live copy is distributed (the user’s system) runs an operating system that supports the earliest supported Postgres version and distribution. Currently (9/28/23) the SBN supports Postgres V14 and higher.

  1. Download time constraints

The MPCDB tables may grow quickly in size with the advent of new surveys. Consequently, SBN requires the user’s system to complete the initial download within 3 days. Furthermore, the user’s full system must be able to keep up with the database throughput rate,requiring consideration of factors such as system specification, system configuration, and data transfer bandwidth. Notional thresholds regarding these factors will be provided in the “Best Practices” section of the sbn mpc wiki pages (https://sbnwiki.astro.umd.edu/wiki/SBN_MPC_Wiki), and may be upgraded depending on the average database volume growth from successive generational evolution of survey detection volumes.

  1. User’s system downtime

In order to avoid backlogs in distribution transactions, we require that the user’s do not install the system on a transient server, such as a laptop that often travels with prolonged downtime. We require that the downtime be no more than an average of 14 days (cumulative) per year. The user is to notify SBN if a subscription is no longer being used.

  1. Maximum Institutional copies

We request no more than 4 copies be distributed from SBN to the same institution without contacting the SBN regarding additional copies; SBN grants permissions to duplicate the database within the institution.

  1. User Feedback

The MPC helpdesk shall be used for first reporting user related problems, rather than contacting SBN personnel directly. The MPCDB stewards at the SBN receive tickets through the MPC’s helpdesk.

Appendix A. Technical thresholds.

Violation of any parameter from below list will result in (semi)-automatic termination of the distribution to the institution. Replication could be restored after fixing the source of violation. Technical thresholds are subject to periodic review and modification.

  • Inactive replication slot.
  • Difference between “normal” XMIN (fully synchronized replication slot) and “replication behind” XMIN > 5%.
  • First part of LSN (Log Sequence Number) of the institution replication slot is different from the fully synchronous replication slot LSN. (For example, “normal” LSN 4D1/C5DD9EB0, “will be terminated” LSN 4C8/A4558A78).