Performance
Processing journal of one thousand transactions takes about 50 ms (0.05 seconds). And ten thousand transactions takes about half a second.
Tackler-NG can parse and process one million (1E6) transactions around 15 to 30 seconds, with processing speed of 40 000 to 100 000 transactions per second on normal laptop computer.
Test results
Tackler is performance tested to validate sanity of used algorithms and overall performance of system. These are ballpark figures and should be treated as such.
Artificial test data is used for performance test data. Used test sets are sets of 1E3, 1E4, 1E5 and 1E6 transactions. Tests are run for all report types, and also with transaction filtering and without.
Also Filesystem and Git based storage are tested.
Git Storage backend is performance tested with same data sets than filesystem based storage. Git Storage backend has same or better performance as filesystem backend.
There are five runs for each report type, the fastest and slowest run are removed from results, and then the remaining three values are used to calculate average.
Test setup and full results are described on Performance Readme document.
Results for HW02: Quad core system
This system is laptop system with quad-core cpu. CPU has four cores and eight threads. All journal data be cached in memory.
Tackler-NG initial implementation is based on single thread, single core at the moment. |
Balance report, CPU utilization is around 100%
-
1E3 txns: 0.04 sec, 12 MB
-
1E4 txns: 0.32 sec, 22 MB
-
1E5 txns: 3.28 sec, 100 MB
-
1E6 txns: 33 sec, 800 MB
Full results: Perf HW02
Test data
Performance testing is done with artificial transaction data which mimics real journal. Test data is generated with generator, which can also generate ledger-like compatible journal formats.
Tests are done with all report and format types. Account validation is turned on, (e.g. strict.mode=true
),
so all accounts are checked that they are defined in Chart of Accounts.
Each transaction is located on own file, and sharding of transaction journals is based on txn dates
(e.g. one transaction would be perf-1E6/YYYY/MM/DD/YYYYMMDDTHHMMSS-idx.txn
, where idx
is index of txn).
Used test sets (small and big) are:
-
1E3 (1 000) transactions
-
1E4 (10 000) transactions
-
1E5 (100 000) transactions
-
1E6 (1 000 000) transactions
Chart of Accounts for perf tests
Chart of Accounts has 378 entries and it is generated based on txn’s dates:
For "assets" following structure is used a:ay<year>:am<month>
.
This yields to 12 different accounts:
... "a:ay2016:am01", "a:ay2016:am02", "a:ay2016:am03", ...
For "expenses" following structure is used e:ey<year>:em<month>:ed<day>
.
This yields to 366 different accounts:
... "e:ey2016:em01:ed01", "e:ey2016:em01:ed02", "e:ey2016:em01:ed03", ...
Single file or single input String
Tackler has same processing speed regardless how input is structured, e.g. the processing time is same for one big single journal file as it is for sharded transaction directories.
However, to optimize memory usage of tackler, bigger transactions sets over one hundred thousands (1E5) transactions should split over multiple files, and use FS or GIT storage system with shard directory structure. The one common way to split transaction journals is based on dates, or by producers (each have they own subtree of transactions or journal files which to use).