Skip to content

Content tracking with configurable txn batch size#427

Open
AFaust wants to merge 1 commit intoAlfresco:masterfrom
AFaust:pr/faster-content-tracking-on-sparse-txn-reindex
Open

Content tracking with configurable txn batch size#427
AFaust wants to merge 1 commit intoAlfresco:masterfrom
AFaust:pr/faster-content-tracking-on-sparse-txn-reindex

Conversation

@AFaust
Copy link
Copy Markdown

@AFaust AFaust commented Jun 9, 2023

This pull requests adds two optional configurations to the content tracking process that allow users / customers to

  • set a transaction ID lookup offset instead of relying on a hard-coded 500 offset
  • set a transaction ID processing batch size for collecting documents to be content-indexed

This purpose of these configurations is to allow optimisations for (re-)indexation processes over Alfresco systems with extremely sparse transaction / content update distributions. This affects e.g. systems undergoing a lot of fine grained updates where content transactions with indexable content updates may be spread substantially. In such systems, the costly getDocsWithUncleanContent operation may often be invoked yielding only a single- to low double-digit number of content-containing nodes to be indexed. This may significantly prolong content indexation as the phases to perform concurrent content indexation are very short and may not even be able to use all allowed concurrent threads in the fork-join pool.

As for default values, both options use the previously hard-coded 500 txn offset, so that there is no difference in behaviour to previous versions. Users / customers with sparse transaction / content update distributions may configure substantially higher values as needed. I personally would recommend that Alfresco consider setting a default value for alfresco.content.txnIdLookupBatchSize that is maybe an order of magnitude larger than for alfresco.content.txnProcessingBatchSize - due to backwards consistency concerns I have not included that in the PR.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Jun 9, 2023

CLA assistant check
All committers have signed the CLA.

@aitseitz
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants