This second paper compares PostGreSQL and SQL Server and point out the differences in terms of performances for queries that needs to COUNT.
Our first paper speaks about some comparisons between PostGreSQL and SQL Server in pointing out the differences in terms of performances of somes administrative queries that’s ordinary DBA needs to execute.
Our test material is a HP Z840 workstation with:
- 2 sockets (Intel Xeon E5-2680 v3);
- 128 Gb of RAM;
- OS Windows 10 Enterprise.
The RDBMS are:
- SQL Server 2019 Developper’s edition;
- PostGreSQL v 13.1 community edition.
The client applications used for testing are:
- SQL Server management Studio v 18.5 (for SQL Server);
- PGadmin 4 v 4.29 (for PostGreSQL).
Configuration of RDBMS:
- SQL Server has no particular configuration. The server collation is French_100_BIN2;
- PostGreSQL has a specific parametarization:
1 2 3 4 5 6 7 |
ALTER SYSTEM SET effective_cache_size = 12000000; -- 12 000 000 x 8 192 b : 96 Gb ALTER SYSTEM SET shared_buffers = 2560000; -- 2 560 000 x 8 192 b : 20 Gb ALTER SYSTEM SET work_mem = 500000; -- 500 000 x 1 024 b : 500 Mb ALTER SYSTEM SET temp_buffers = 512000; -- 512 000 x 8 192 b : 4 Gb ALTER SYSTEM SET max_worker_processes = 48; ALTER SYSTEM SET max_parallel_workers = 48; ALTER SYSTEM SET max_parallel_workers_per_gather = 8; |
Configuration of the database:
SQL Server has a storage configuration as follow:
- Data file : 12 Gb;
- Transaction log file : 8 Gb.
PostGreSQL has a specific collation comming with the OS parameterization:
- Encoding = UTF8;
- Collation = French_France.1252;
- Character type = French_France.1252.
Our test protocol was:
- running 10 times each queries;
- eliminating best and worst response time;
- computing the average response time of the 8 remaining elapsed time;
- we stopped the other RDBMS service while testing one.
The SQL queries to reproduce the test can be download at:
https://1drv.ms/u/s!AqvZfiQYoNpBihI4efgbhodhRZec?e=07igoa
- The file « Benchmark test PostGreSQL.sql » contains the SQL script to create the table and populate it, and also all the queries we tested for PostGreSQL;
- The file « Benchmark test SQL Server.sql » contains the SQL script to create the table and populate it, and also all the queries we tested for SQL Server;
- The file « personnes.zip » contains the data to load into the table of our tests (table have two versions). It’s a ZIP file containing a text file of 239 820 Kb inside which you will find 10 million rows of persons.
The tables are:
SQL Server | PostGreSQL |
---|---|
CREATE TABLE T_PERSONNE_PRS (PRS_ID INT IDENTITY PRIMARY KEY, PRS_NOM VARCHAR(32), PRS_PRENOM VARCHAR(25)) | CREATE TABLE T_PERSONNE_PRS (PRS_ID SERIAL PRIMARY KEY, PRS_NOM VARCHAR(32), PRS_PRENOM VARCHAR(25)); |
CREATE TABLE T_PERSONNE_CI_AI_PRS (PRS_ID INT PRIMARY KEY, PRS_NOM VARCHAR(32) COLLATE french_ci_ai, PRS_PRENOM VARCHAR(25) COLLATE french_ci_ai); | CREATE COLLATION fr_ci_ai (provider = icu, locale = 'fr-u-ks-level1-kc-false', deterministic=false); CREATE TABLE t_personne_ci_ai_prs (prs_id INT PRIMARY KEY, prs_nom VARCHAR(32) COLLATE fr_ci_ai, prs_prenom VARCHAR(25) COLLATE fr_ci_ai); |
We had to create a collation for PostGreSQL’s table 2 because PostGreSQL has very few internal collation unlike SQL Server which has over 5500.
Under normal production circumstances, for example with publisher software, this would have been impossible to do. This limitation of PostGreSQL is a real anomaly and if we had had to perform counts without taking into account accents and other diacritics (ligatures for example), it would have been strictly impossible with PostGreSQL!
The COUNT queries are:
Query | |
---|---|
QRY 1 | SELECT COUNT(*) FROM T_PERSONNE_PRS; |
QRY 2 | SELECT COUNT(DISTINCT PRS_PRENOM) FROM T_PERSONNE_PRS; |
QRY 3 | SELECT COUNT(DISTINCT PRS_PRENOM), COUNT(DISTINCT PRS_NOM) FROM T_PERSONNE_PRS; |
QRY 4 | SELECT COUNT(DISTINCT PRS_PRENOM + PRS_NOM) FROM T_PERSONNE_PRS; |
QRY 5 | SELECT PRS_PRENOM, COUNT(*) FROM T_PERSONNE_PRS GROUP BY PRS_PRENOM; |
QRY 6 | SELECT PRS_PRENOM, COUNT(*) FROM T_PERSONNE_PRS GROUP BY PRS_PRENOM HAVING COUNT(*) > 20000; |
QRY 7 | SELECT CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)), COUNT(*) FROM T_PERSONNE_PRS GROUP BY CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)); |
QRY 8 | SELECT UPPER(CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1))), COUNT(*) FROM T_PERSONNE_PRS GROUP BY UPPER(CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1))) |
QRY 9 | SELECT CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)), COUNT(*) FROM T_PERSONNE_CI_AI_PRS GROUP BY CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)); |
QRY 10 (1) | SELECT CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)), COUNT(*) FROM T_PERSONNE_CI_AI_PRS WHERE PRS_PRENOM LIKE 'Jean-%' GROUP BY CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1)) |
QRY 10 (PostGreSQL) | SELECT UPPER(CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1))), COUNT(*) FROM T_PERSONNE_PRS WHERE PRS_PRENOM ILIKE 'Jean-%' GROUP BY UPPER(CONCAT(LEFT(PRS_PRENOM, 1), LEFT(PRS_NOM, 1))); |
(1) QRY 10 raised an exception on PostGreSQL, because PostGreSQL is unable to run COLLATE operator properly with some other operator like the LIKE SQL operator. We have rewrite the query using another table with the ILIKE operator – QRY 10 (PostGreSQL).
Performances of COUNT without any index:
SQL Server execution time (ms) | PostGreSQL execution time (ms) | ratio | |
---|---|---|---|
QRY 1 | 26 | 290 | 11.15 |
QRY 2 | 79 | 34770 | 440.23 |
QRY 3 | 152 | 69000 | 453.95 |
QRY 4 | 1466 | 88000 | 60.03 |
QRY 5 | 76 | 873 | 11.49 |
QRY 6 | 78 | 756 | 9.69 |
QRY 7 | 222 | 5959 | 26.84 |
QRY 8 | 257 | 8920 | 34.71 |
QRY 9 | 212 | 12776 | 60.26 |
QRY 10 (1) | 140 | 2695 | 19.25 |
PostGreSQL is between 10 and 454 times slower than SQL Server for COUNT without indexes.
Performances of COUNT with indexes:
Index created are as follow:
- CREATE INDEX X_1 ON T_PERSONNE_PRS (PRS_NOM);
- CREATE INDEX X_2 ON T_PERSONNE_PRS (PRS_PRENOM);
- CREATE INDEX X_3 ON T_PERSONNE_PRS (PRS_NOM, PRS_PRENOM);
SQL Server execution time (ms) | PostGreSQL execution time (ms) | ratio | |
---|---|---|---|
QRY 1 | 28 | 305 | 10.89 |
QRY 2 | 61 | 32995 | 540.41 |
QRY 3 | 123 | 69000 | 560.98 |
QRY 4 | 1367 | 87000 | 63.64 |
QRY 5 | 123 | 826 | 6.72 |
QRY 6 | 93 | 716 | 7.70 |
QRY 7 | 346 | 9178 | 15.57 |
QRY 8 | 238 | 12873 | 38.56 |
QRY 9 | 123 | 73.98 | |
QRY 10 (1) | 140 | 2755 | 22.40 |
PostGreSQL is between 7 and 561 times slower than SQL Server for COUNT with appropriate indexes.
SQL Server is better with indexes for queries 2, 8, 9, 10 but the gain is very few. SQL Server is worst with indexes for queries 5, 6, 7, but the loss is also very few.
PostGreSQL has nearly no gain or loss with or without index…
Performances of COUNT with indexes and compression:
We have compressed tables and index in SQL Server (there is no compression in PostGreSQL):
- ALTER TABLE T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
- ALTER INDEX X_1 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
- ALTER INDEX X_2 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
- ALTER INDEX X_3 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
SQL Server execution time (ms) | PostGreSQL execution time (ms) | ratio | |
---|---|---|---|
QRY 1 | 34 | 305 | 8.97 |
QRY 2 | 100 | 32995 | 329.65 |
QRY 3 | 161 | 69000 | 428.57 |
QRY 4 | 1401 | 87000 | 62.10 |
QRY 5 | 228 | 826 | 3.62 |
QRY 6 | 129 | 716 | 5.55 |
QRY 7 | 345 | 9178 | 17.62 |
QRY 8 | 279 | 12873 | 32.90 |
QRY 9 | 239 | 53.86 | |
QRY 10 (1) | 161 | 2755 | 17.11 |
SQL Server is worst with indexes and compression for all queries but the loss is very few (always <2).
The compression ratio (there is no NULL inside the table data) is about a factor 2.
Performances of COUNT with columnstore index:
We have drop all indexes in SQL Server and replace them by one single columstore index:
- CREATE COLUMNSTORE INDEX XC ON T_PERSONNE_CI_AI_PRS (PRS_PRENOM, PRS_NOM);
SQL Server execution time (ms) | PostGreSQL execution time (ms) | ratio | |
---|---|---|---|
QRY 1 | 12 | 305 | 25.42 |
QRY 2 | 31 | 32995 | 1063.39 |
QRY 3 | 45 | 69000 | 1533.33 |
QRY 4 | 1129 | 87000 | 77.06 |
QRY 5 | 21 | 826 | 39.33 |
QRY 6 | 17 | 716 | 42.12 |
QRY 7 | 203 | 9178 | 29.95 |
QRY 8 | 266 | 12873 | 34.50 |
QRY 9 | 199 | 64.69 | |
QRY 10 (1) | 19 | 2755 | 145.00 |
SQL Server is better with columnstore index rather than with BTree indexes in a factor between 2 and 7, except for query 4, 8 and 9 (data transformations).
What we can say about the performance of the COUNT queries between SQL Server and PostGreSQL, are the following:
PostGreSQL is incredibly slow for COUNT DISTINCT: SQL Server is between 61 and 561 times faster in all situations, and with columnstore index SQL Server can be 1,533 time faster (Query 3)…
PostGreSQL is very slow for ordinary COUNT:
- SQL Server is between 10 and 60 times faster without indexes
- SQL Server is between 7 and 74 times faster with appropriate indexes
- SQL Server is between 4 and 54 times faster with appropriate indexes and compression (that PG does not have)
- SQL Server is between 25 and 145 times faster with a columnstore index ().
NOTE : PostGreSQL can have vertical indexing (column store rather than row store), but you need to migrate from the community edition to Fujitsu or Citus.
Globaly PostGreSQL is 114 times slower than SQL Server in COUNT queries…
Why is PostGreSQL so slow ?
PostGreSQL process to count need to scan every pages of data, reading one by one the record slots to know if the slot is not a phantom one, due to the fact that the MVCC creates multiple row versions inside the table data pages when concurrent transactions writes data, while other RDBMS, does it in a specific storage (transaction log, tempdb…). Also PostGreSQL use only one thread to do the COUNT, despite the parallelism options setted, and must compares data to distincts it when a COUNT DISTINCT is requiered.
To do its counting SQL Server reads only the page header of the datapages and does it in parallel (up to 48 threads in our case). All the page headers contains two bytes, at the same location, in which it finds the number of living rows. But I don’t know the trick used by SQL Server to be so fast for the COUNT DISTINCT…. It seems magic! Of course the research and development department of Microsoft is one of the best… Even Oracle database is slower compared to SQL Server !
Frédéric Brouard - SQLpro - ARCHITECTE DE DONNÉES - expert SGBDR et langage SQL
* Le site sur les SGBD relationnels et le SQL : https://sqlpro.developpez.com *
* le blog SQL, SQL Server, SGBDR... sur : https://blog.developpez.com/sqlpro/ *
* Entreprise SQL SPOT : modélisation, conseil, audit, optimisation, formation *
* Site d'entreprise : https://www.sqlspot.com -- email : SQLpro[@]SQLspot.com *
* * * * * Enseignant CNAM PACA - ISEN Toulon - CESI Aix en Provence * * * * *
PREVIOUS :
PostGreSQL vs Microsoft SQL Server – Comparison part 1 : DBA command performances
NEXT : PostGreSQL vs Microsoft SQL Server – Comparison part 3 : FEATURES (to come near by 2021-04-01)