PostGreSQL vs Microsoft SQL Server – Comparison part 2 : COUNT performances

This second paper compares PostGreSQL and SQL Server and point out the differences in terms of performances for queries that needs to COUNT.


Our first paper speaks about some comparisons between PostGreSQL and SQL Server in pointing out the differences in terms of performances of somes administrative queries that’s ordinary DBA needs to execute.

Our test material is a HP Z840 workstation with:

  • 2 sockets (Intel Xeon E5-2680 v3);
  • 128 Gb of RAM;
  • OS Windows 10 Enterprise.

The RDBMS are:

  • SQL Server 2019 Developper’s edition;
  • PostGreSQL v 13.1 community edition.

The client applications used for testing are:

  • SQL Server management Studio v 18.5 (for SQL Server);
  • PGadmin 4 v 4.29 (for PostGreSQL).

Configuration of RDBMS:

  • SQL Server has no particular configuration. The server collation is French_100_BIN2;
  • PostGreSQL has a specific parametarization:

Configuration of the database:

SQL Server has a storage configuration as follow:

  • Data file : 12 Gb;
  • Transaction log file : 8 Gb.

PostGreSQL has a specific collation comming with the OS parameterization:

  • Encoding = UTF8;
  • Collation = French_France.1252;
  • Character type = French_France.1252.

Our test protocol was:

  • running 10 times each queries;
  • eliminating best and worst response time;
  • computing the average response time of the 8 remaining elapsed time;
  • we stopped the other RDBMS service while testing one.

The SQL queries to reproduce the test can be download at:

https://1drv.ms/u/s!AqvZfiQYoNpBihI4efgbhodhRZec?e=07igoa

  • The file « Benchmark test PostGreSQL.sql » contains the SQL script to create the table and populate it, and also all the queries we tested for PostGreSQL;
  • The file « Benchmark test SQL Server.sql » contains the SQL script to create the table and populate it, and also all the queries we tested for SQL Server;
  • The file « personnes.zip » contains the data to load into the table of our tests (table have two versions). It’s a ZIP file containing a text file of 239 820 Kb inside which you will find 10 million rows of persons.

The tables are:

SQL ServerPostGreSQL
CREATE TABLE T_PERSONNE_PRS
(PRS_ID INT IDENTITY PRIMARY KEY,
PRS_NOM VARCHAR(32),
PRS_PRENOM VARCHAR(25))
CREATE TABLE T_PERSONNE_PRS
(PRS_ID SERIAL PRIMARY KEY,
PRS_NOM VARCHAR(32),
PRS_PRENOM VARCHAR(25));
CREATE TABLE T_PERSONNE_CI_AI_PRS
(PRS_ID INT PRIMARY KEY,
PRS_NOM VARCHAR(32) COLLATE french_ci_ai,
PRS_PRENOM VARCHAR(25) COLLATE french_ci_ai);
CREATE COLLATION fr_ci_ai
(provider = icu,
locale = 'fr-u-ks-level1-kc-false',
deterministic=false);

CREATE TABLE t_personne_ci_ai_prs
(prs_id INT PRIMARY KEY,
prs_nom VARCHAR(32) COLLATE fr_ci_ai,
prs_prenom VARCHAR(25) COLLATE fr_ci_ai);

We had to create a collation for PostGreSQL’s table 2 because PostGreSQL has very few internal collation unlike SQL Server which has over 5500.
Under normal production circumstances, for example with publisher software, this would have been impossible to do. This limitation of PostGreSQL is a real anomaly and if we had had to perform counts without taking into account accents and other diacritics (ligatures for example), it would have been strictly impossible with PostGreSQL!

The COUNT  queries are:

 Query
QRY 1SELECT COUNT(*) FROM T_PERSONNE_PRS;
QRY 2SELECT COUNT(DISTINCT PRS_PRENOM)
FROM T_PERSONNE_PRS;
QRY 3SELECT COUNT(DISTINCT PRS_PRENOM),
COUNT(DISTINCT PRS_NOM)
FROM T_PERSONNE_PRS;
QRY 4SELECT COUNT(DISTINCT PRS_PRENOM + PRS_NOM)
FROM T_PERSONNE_PRS;
QRY 5SELECT PRS_PRENOM, COUNT(*)
FROM T_PERSONNE_PRS
GROUP BY PRS_PRENOM;
QRY 6SELECT PRS_PRENOM, COUNT(*)
FROM T_PERSONNE_PRS
GROUP BY PRS_PRENOM
HAVING COUNT(*) > 20000;
QRY 7SELECT CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1)),
COUNT(*)
FROM T_PERSONNE_PRS
GROUP BY CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1));
QRY 8SELECT UPPER(CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1))),
COUNT(*)
FROM T_PERSONNE_PRS
GROUP BY UPPER(CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1)))
QRY 9SELECT CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1)),
COUNT(*)
FROM T_PERSONNE_CI_AI_PRS
GROUP BY CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1));
QRY 10 (1)SELECT CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1)),
COUNT(*)
FROM T_PERSONNE_CI_AI_PRS
WHERE PRS_PRENOM LIKE 'Jean-%'
GROUP BY CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1))
QRY 10 (PostGreSQL)SELECT UPPER(CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1))),
COUNT(*)
FROM T_PERSONNE_PRS
WHERE PRS_PRENOM ILIKE 'Jean-%'
GROUP BY UPPER(CONCAT(LEFT(PRS_PRENOM, 1),
LEFT(PRS_NOM, 1)));

(1) QRY 10 raised an exception on PostGreSQL, because PostGreSQL is unable to run COLLATE operator properly with some other operator like the LIKE SQL operator. We have rewrite the query using another table with the ILIKE operator – QRY 10 (PostGreSQL).

Performances of COUNT without any index:

 SQL Server execution time (ms)PostGreSQL execution time (ms)ratio
QRY 12629011.15
QRY 27934770440.23
QRY 315269000453.95
QRY 414668800060.03
QRY 57687311.49
QRY 6787569.69
QRY 7222595926.84
QRY 8257892034.71
QRY 92121277660.26
QRY 10 (1)140269519.25

PostGreSQL is between 10 and 454 times slower than SQL Server for COUNT without indexes.

Performances of COUNT with indexes:

Index created are as follow:

  • CREATE INDEX X_1 ON T_PERSONNE_PRS (PRS_NOM);
  • CREATE INDEX X_2 ON T_PERSONNE_PRS (PRS_PRENOM);
  • CREATE INDEX X_3 ON T_PERSONNE_PRS (PRS_NOM, PRS_PRENOM);
 SQL Server execution time (ms)PostGreSQL execution time (ms)ratio
QRY 12830510.89
QRY 26132995540.41
QRY 312369000560.98
QRY 413678700063.64
QRY 51238266.72
QRY 6937167.70
QRY 7346917815.57
QRY 82381287338.56
QRY 912373.98
QRY 10 (1)140275522.40

PostGreSQL is between 7 and 561 times slower than SQL Server for COUNT with appropriate indexes.

SQL Server is better with indexes for queries 2, 8, 9, 10 but the gain is very few. SQL Server is worst with indexes for queries 5, 6, 7, but the loss is also very few.

PostGreSQL has nearly no gain or loss with or without index…

 

Performances of COUNT with indexes and compression:

We have compressed tables and index in SQL Server (there is no compression in PostGreSQL):

  • ALTER TABLE T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
  • ALTER INDEX X_1 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
  • ALTER INDEX X_2 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
  • ALTER INDEX X_3 ON T_PERSONNE_PRS REBUILD WITH(DATA_COMPRESSION = PAGE);
 SQL Server execution time (ms)PostGreSQL execution time (ms)ratio
QRY 1343058.97
QRY 210032995329.65
QRY 316169000428.57
QRY 414018700062.10
QRY 52288263.62
QRY 61297165.55
QRY 7345917817.62
QRY 82791287332.90
QRY 923953.86
QRY 10 (1)161275517.11

SQL Server is worst with indexes and compression for all queries but the loss is very few (always <2).

The compression ratio (there is no NULL inside the table data) is about a factor 2.

 

Performances of COUNT with columnstore index:

We have drop all indexes in SQL Server and replace them by one single columstore index:

  • CREATE COLUMNSTORE INDEX XC ON T_PERSONNE_CI_AI_PRS (PRS_PRENOM, PRS_NOM);
 SQL Server execution time (ms)PostGreSQL execution time (ms)ratio
QRY 11230525.42
QRY 231329951063.39
QRY 345690001533.33
QRY 411298700077.06
QRY 52182639.33
QRY 61771642.12
QRY 7203917829.95
QRY 82661287334.50
QRY 919964.69
QRY 10 (1)192755145.00

SQL Server is better with columnstore index rather than with BTree indexes in a factor between 2 and 7, except for query 4, 8 and 9 (data transformations).

 

What we can say about the performance of the COUNT queries between SQL Server and PostGreSQL, are the following:

PostGreSQL is incredibly slow for COUNT DISTINCT: SQL Server is between 61 and 561 times faster in all situations, and with columnstore index SQL Server can be 1,533 time faster (Query 3)…

PostGreSQL is very slow for ordinary COUNT:

  • SQL Server is between 10 and 60 times faster without indexes
  • SQL Server is between 7 and 74 times faster with appropriate indexes
  • SQL Server is between 4 and 54 times faster with appropriate indexes and compression (that PG does not have)
  • SQL Server is between 25 and 145 times faster with a columnstore index ().

NOTE : PostGreSQL can have vertical indexing (column store rather than row store), but you need to migrate from the community edition to Fujitsu or Citus.

Globaly PostGreSQL is 114 times slower than SQL Server in COUNT queries…

COUNT performances: PostGreSQL vs SQL Server

 

Why is PostGreSQL so slow ?

PostGreSQL process to count need to scan every pages of data, reading one by one the record slots to know if the slot is not a phantom one, due to the fact that the MVCC creates multiple row versions inside the table data pages when concurrent transactions writes data, while other RDBMS, does it in a specific storage (transaction log, tempdb…). Also PostGreSQL use only one thread to do the COUNT, despite the parallelism options setted, and must compares data to distincts it when a COUNT DISTINCT is requiered.

To do its counting SQL Server reads only the page header of the datapages and does it in parallel (up to 48 threads in our case). All the page headers contains two bytes, at the same location, in which it finds the number of living rows. But I don’t know the trick used by SQL Server to be so fast for the COUNT DISTINCT…. It seems magic! Of course the research and development department of Microsoft is one of the best… Even Oracle database is slower compared to SQL Server !

 


Frédéric Brouard - SQLpro - ARCHITECTE DE DONNÉES - expert SGBDR et langage SQL
* Le site sur les SGBD relationnels et le SQL : https://sqlpro.developpez.com *
* le blog SQL, SQL Server, SGBDR... sur : https://blog.developpez.com/sqlpro/ *
* Entreprise SQL SPOT : modélisation, conseil, audit, optimisation, formation *
* Site d'entreprise : https://www.sqlspot.com -- email : SQLpro[@]SQLspot.com *
* * * * * Enseignant CNAM PACA - ISEN Toulon - CESI Aix en Provence * * * * *

PREVIOUS :

PostGreSQL vs Microsoft SQL Server – Comparison part 1 : DBA command performances

NEXT : PostGreSQL vs Microsoft SQL Server – Comparison part 3 : FEATURES (to come near by 2021-04-01)

 

Ce contenu a été publié dans Performances, PostGreSQL, SQL Server, avec comme mot(s)-clé(s) , , , , , . Vous pouvez le mettre en favoris avec ce permalien.