UUID performance improvement considerations
Description
Environment
8.2
Attachments
relates to
Activity
Carlos Ruiz October 30, 2023 at 1:21 PM
Hi
It is kind of self contracting that the implementation is not tuned for storage and performance efficiency.
I think is not contradicting, my assessment is about benefit/cost.
We can make it storage and performant efficient at the cost of not being backward compatible. We can measure the impact on core (which is huge), but we cannot measure the impact on plugins, reports, integrations out there - which I assume can be also huge.
So, no, I think is not contradicting.
Carlos Ruiz October 30, 2023 at 1:12 PM
- that was the first intention of ticket
You can read in the description, initially that was the exploration, change all numerics, I started changing things and the changes were cascading and cascading, affecting the whole core, and potentially breaking all plugins, so it was decided that a better approach was to move to support UUID key based tables, and the ticket changed to that goal.
There are a few things that we can do to improve disk and performance still using Strings for UUID:
remove the dashes from the UUID. I haven’t explored this, but it sounds possible and that would save 4 characters on each UUID, so, instead of varchar(36) we would need varchar(32), the dashes would need to be added back programatically to be compliant with the UUID format. This means a gain of ~ 11% of disk space. I’m not too eager to do this change, the gain is probably too little to be worthy anyways.
change from VARCHAR to CHAR. From what I read, VARCHAR uses one extra byte always to save the length of the string, and is unnecessary when our string has always fixed length.
This change seems simple, backward compatible and non-breaking, and is probably very worthy, we can also dismiss the CHAR suffix on oracle, I mean instead of CHAR(36 CHAR) use just CHAR(36)Postgresql → VARCHAR(36) → CHAR(36)
Oracle → VARCHAR2(36 CHAR) → CHAR(36)
Deepak Pansheriya October 30, 2023 at 11:42 AM
/ for supporting huge amound of data, what if we convert ID column to be bigserial
in postgresql and uni64_t type in oracle?
Integer key column are more performant than String.
Heng Sin Low October 25, 2023 at 8:36 AM
Just for the record:
The PostgreSQL impact analysis below for generate_uuid() is not accurate since we can change the function to return uuid instead of varchar(36).
Heng Sin Low October 20, 2023 at 3:40 PM
One of the primary reason we want to have UUID only key table is for storage of huge amount of data (i.e more than what the current integer primary key can support). It is kind of self contracting that the implementation is not tuned for storage and performance efficiency.
Anyway, since it is already October and the impact is significant, I guess we can keep this open as a feature request and perhaps re-visit this again in future release (i.e post 11).
UUID fields in all tables should be improved, follow:
A. data type is varchar(36) - potential change to UUID type: (Postgres and oracle support UUID type)
theoretically (ERROR: default for column "
ad_note_uu
" cannot be cast automatically to type uuid)add/remove (conflict with views)
benefit: point B.
B. uuid B-Tree index size is HALF then varchar (36)
I did tests, where the index size is half. in my case, ad_note_uu_idx it was 105MB after approx 50MB
our max uuid index size is around 433MB approx 5mill rows. assume around 200MB.
topic confirm my assumptions:
C. indexes UUID, never scanned - however, updated on INSERT/UPDATE/DELETE
can implementor drop this index securely if needed? because in many cases useless, never used.
huge tables with millions of rows should benefit from drop uu indexes.
exported by: