Skip to content

Text attributes data truncation#893

Open
fogelito wants to merge 16 commits into
mainfrom
upsert-text-truncation
Open

Text attributes data truncation#893
fogelito wants to merge 16 commits into
mainfrom
upsert-text-truncation

Conversation

@fogelito

@fogelito fogelito commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

  • New Features

    • TEXT, MEDIUMTEXT, and LONGTEXT columns now enforce byte-length limits for UTF-8 content, preventing silent data truncation and providing clear error messages when capacity is exceeded.
  • Tests

    • Added comprehensive test coverage for byte-length validation across document creation, validation, and updates.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@fogelito, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 27 minutes and 2 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c52bc419-49a5-45f1-87fd-4b55f3f7f1fb

📥 Commits

Reviewing files that changed from the base of the PR and between cf959b1 and 05e58dd.

📒 Files selected for processing (1)
  • tests/e2e/Adapter/Scopes/DocumentTests.php
📝 Walkthrough

Walkthrough

Three new Database constants define TEXT-family byte capacities. A new ByteLength validator enforces strlen-based limits. Structure::checkForInvalidAttributeValues() adds dedicated cases for VAR_TEXT, VAR_MEDIUMTEXT, and VAR_LONGTEXT, each pairing a Text validator with ByteLength. Attribute::checkType(), MariaDB::getSQLType(), and SQLite byte-ceiling constants replace hardcoded literals with the new constants. Unit and E2E tests cover rejection of over-capacity multibyte strings and acceptance of in-limit values.

Changes

Byte-safe TEXT validation

Layer / File(s) Summary
Byte-size constants and ByteLength validator
src/Database/Database.php, src/Database/Validator/ByteLength.php
Database gains MAX_TEXT_BYTES, MAX_MEDIUMTEXT_BYTES, and MAX_LONGTEXT_BYTES constants. New ByteLength validator stores a max byte cap and enforces it via strlen, returning invalid for non-strings or strings exceeding the configured limit.
Validator and adapter integration
src/Database/Validator/Structure.php, src/Database/Validator/Attribute.php, src/Database/Adapter/MariaDB.php, src/Database/Adapter/SQLite.php
Structure::checkForInvalidAttributeValues() splits VAR_TEXT/VAR_MEDIUMTEXT/VAR_LONGTEXT into dedicated cases, each applying a Text validator and a ByteLength validator capped at the matching MAX_*_BYTES constant. Attribute::checkType() replaces hardcoded byte thresholds with the same constants. MariaDB::getSQLType() and SQLite byte-ceiling constants reference the constants instead of literals.
Unit tests
tests/unit/Validator/StructureTest.php
Adds testTextByteSafeValidationTooBig, testTextByteSafeValidationMultibyte, and testTextByteSafeValidationValid asserting byte-length enforcement for a VAR_TEXT attribute whose declared size exceeds the effective 65,535-byte capacity. Annotates the existing testTextValidation boundary value.
E2E tests
tests/e2e/Adapter/Scopes/DocumentTests.php
Adds testTextByteTruncationCreate, testTextByteTruncationValid, and testTextByteTruncationUpdate confirming that over-capacity multibyte emoji strings throw StructureException on createDocument() and updateDocument(), while a 65,535-byte ASCII value round-trips unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • utopia-php/database#788: Adds VAR_TEXT/VAR_MEDIUMTEXT/VAR_LONGTEXT mappings in MariaDB::getSQLType(), which this PR further modifies to use the new Database::MAX_*TEXT_BYTES thresholds.

Suggested reviewers

  • abnegate
  • ArnabChatterjee20k

Poem

🐰 Hop, hop, bytes in a row,
No emoji too large shall overflow!
ByteLength guards the TEXT with care,
65535 bytes — not one more dare.
The rabbit checks strlen and grins,
utf8mb4 safety wins! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Text attributes data truncation' clearly describes the main change: the PR adds byte-length validation and prevention of truncation for TEXT-family column types, directly addressing the core objective of enforcing text truncation boundaries.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch upsert-text-truncation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/unit/Validator/StructureTest.php (2)

412-413: 💤 Low value

Minor: Multibyte test lacks error message assertion.

For consistency with the ASCII overflow test on line 409, consider asserting the error message for the multibyte test as well.

🔍 Add assertion for consistency
         // Multi-byte content over the limit is rejected the same way.
         $multibyte = \str_repeat('📝', 20000);
         $this->assertEquals(false, $validator->isValid(new Document($base + ['blocks_json' => $multibyte])));
+        $this->assertEquals('Invalid document structure: Attribute "blocks_json" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 412 - 413, The multibyte
overflow test starting at the line with str_repeat and '📝' is missing an
assertion for the error message, unlike the ASCII overflow test on line 409.
After the assertEquals call that validates isValid returns false for the
multibyte Document, add an additional assertion to check the error message
returned by the validator, following the same pattern as the ASCII overflow test
for consistency.

995-1047: 💤 Low value

Consider adding test case for byte-safe limit edge.

While testTextByteSafeValidation comprehensively tests the 16,383-character byte-safe limit for legacy attributes, testTextValidation could benefit from a similar edge-case test. Currently it tests 16,383 chars (pass) and 65,536 chars (fail due to declared size), but doesn't verify that 16,384 chars would fail due to the byte-safe limit.

🧪 Optional test case to add

Add this test case between the existing assertions to verify byte-safe enforcement:

        $this->assertEquals(false, $validator->isValid(new Document([
            '$collection' => ID::custom('posts'),
            'title' => 'Demo Title',
            'description' => 'Demo description',
            'rating' => 5,
            'price' => 1.99,
            'published' => true,
            'tags' => ['dog', 'cat', 'mouse'],
            'feedback' => 'team@appwrite.io',
            'text_field' => \str_repeat('a', 16384),
            '$createdAt' => '2000-04-01T12:00:00.000+00:00',
            '$updatedAt' => '2000-04-01T12:00:00.000+00:00'
        ])));

        $this->assertEquals('Invalid document structure: Attribute "text_field" has invalid type. Value must be a valid string and no longer than 16383 chars', $validator->getDescription());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 995 - 1047, The
testTextValidation method should include an additional test case to verify the
byte-safe character limit edge case. Add a new assertion between the existing
test cases that validates a Document with text_field set to \str_repeat('a',
16384) should fail (assertEquals false), and verify the corresponding error
message from getDescription indicates the byte-safe limit of 16383 characters,
similar to the pattern already established in the method with the 65536
character test case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/Database/Database.php`:
- Around line 118-121: The MAX_LONGTEXT_BYTES constant in the Database class is
set to a value that exceeds the 32-bit signed integer maximum, causing it to be
converted to a float in 32-bit PHP environments. This breaks the intdiv call in
downstream code that expects integer arguments. Fix this by explicitly casting
MAX_LONGTEXT_BYTES to int using the (int) type cast operator to ensure it
remains an integer on all PHP platforms, including 32-bit environments.

---

Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 412-413: The multibyte overflow test starting at the line with
str_repeat and '📝' is missing an assertion for the error message, unlike the
ASCII overflow test on line 409. After the assertEquals call that validates
isValid returns false for the multibyte Document, add an additional assertion to
check the error message returned by the validator, following the same pattern as
the ASCII overflow test for consistency.
- Around line 995-1047: The testTextValidation method should include an
additional test case to verify the byte-safe character limit edge case. Add a
new assertion between the existing test cases that validates a Document with
text_field set to \str_repeat('a', 16384) should fail (assertEquals false), and
verify the corresponding error message from getDescription indicates the
byte-safe limit of 16383 characters, similar to the pattern already established
in the method with the 65536 character test case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 338e4613-aa3d-423d-af38-75776ac9f9df

📥 Commits

Reviewing files that changed from the base of the PR and between cfba533 and cbdc3cf.

📒 Files selected for processing (4)
  • src/Database/Database.php
  • src/Database/Validator/Structure.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/unit/Validator/StructureTest.php

Comment thread src/Database/Database.php Outdated
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

  • Adds byte-length validation for TEXT, MEDIUMTEXT, and LONGTEXT-style attributes.
  • Introduces shared byte-capacity constants and a ByteLength validator based on encoded string size.
  • Updates MariaDB/SQLite adapter limit references and structure validation paths.
  • Adds unit and E2E coverage for oversized multibyte values and valid full-capacity text values.

Confidence Score: 5/5

The change is narrowly scoped to text attribute capacity validation and includes targeted coverage for creation, validation, and update paths.

No review comments were identified, and the implementation is backed by tests covering multibyte byte-length boundaries and adapter limit references.

T-Rex T-Rex Logs

What T-Rex did

  • T-Rex attempted the text-byte-limits harness on the base checkout, but the run exited with code 127 due to missing PHP.
  • T-Rex attempted the harness after switching to the head checkout, and the run again exited with code 127 due to missing PHP.
  • The blocker is an environmentalRuntime availability issue, as both checks show php not found, so no product behavior was observed.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (8): Last reviewed commit: "messages" | Re-trigger Greptile

Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread src/Database/Validator/Structure.php Outdated
Comment thread tests/unit/Validator/StructureTest.php Outdated
Comment thread src/Database/Adapter/MariaDB.php
Comment thread src/Database/Validator/Structure.php
Comment thread tests/e2e/Adapter/Base.php Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/unit/Validator/StructureTest.php (1)

441-445: ⚡ Quick win

Assert the exact failure reason in the multibyte overflow test.

This test currently validates only false; asserting the description too will lock the byte-limit contract and catch regressions in failure source.

Suggested test hardening
 $multibyte = \str_repeat('📝', 20000);
 $this->assertEquals(false, $validator->isValid(new Document($base + ['text' => $multibyte])));
+$this->assertEquals(
+    'Invalid document structure: Attribute "text" has invalid type. Value must be a valid string no longer than 65535 bytes',
+    $validator->getDescription()
+);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/Validator/StructureTest.php` around lines 441 - 445, The multibyte
overflow test in the test method currently only asserts that the validator
returns false but does not verify the specific failure reason. Strengthen this
test by adding an additional assertion that checks the failure description or
error message from the validator after the isValid call with the multibyte
document. This will ensure that the specific byte-limit contract is properly
validated and catch any regressions where the failure source changes. Reference
the validator's method to retrieve the error description (commonly methods like
getError, getFailureDescription, or similar) to assert the expected error
message alongside the false result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@phpunit.xml`:
- Line 10: The stopOnFailure attribute in the phpunit.xml configuration is set
to "true", which causes the test suite to halt immediately upon encountering the
first failure. This masks subsequent failures and reduces diagnostic value in CI
environments. Change stopOnFailure from "true" to "false" to allow the full test
suite to execute and report all failures in a single run, providing better
visibility into the overall test health.

In `@src/Database/Validator/ByteLength.php`:
- Line 53: The file is missing a trailing newline at the end of the file, which
violates PSR-12 standards that require a single blank line at the end of every
file. Add a single newline character after the closing brace at the end of the
ByteLength.php file to satisfy the Pint code style checker and resolve the
single_blank_line_at_eof CI failure.

In `@tests/e2e/Adapter/Base.php`:
- Around line 26-27: Uncomment the two trait usage statements in the Base class
by removing the leading // from the lines containing CollectionTests and
CustomDocumentTypeTests. These commented-out trait usages need to be re-enabled
so that their test methods are included in the test discovery and execution,
restoring the full E2E coverage that these traits provide to the adapter tests.

---

Nitpick comments:
In `@tests/unit/Validator/StructureTest.php`:
- Around line 441-445: The multibyte overflow test in the test method currently
only asserts that the validator returns false but does not verify the specific
failure reason. Strengthen this test by adding an additional assertion that
checks the failure description or error message from the validator after the
isValid call with the multibyte document. This will ensure that the specific
byte-limit contract is properly validated and catch any regressions where the
failure source changes. Reference the validator's method to retrieve the error
description (commonly methods like getError, getFailureDescription, or similar)
to assert the expected error message alongside the false result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d8392e1-0183-4343-9aef-8defb270cf4f

📥 Commits

Reviewing files that changed from the base of the PR and between 24cb977 and 3dc94f9.

📒 Files selected for processing (6)
  • phpunit.xml
  • src/Database/Validator/ByteLength.php
  • src/Database/Validator/Structure.php
  • tests/e2e/Adapter/Base.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/unit/Validator/StructureTest.php

Comment thread phpunit.xml Outdated
Comment thread src/Database/Validator/ByteLength.php Outdated
Comment thread tests/e2e/Adapter/Base.php Outdated
@fogelito fogelito changed the title Upsert text truncation Text attributes data truncation Jun 18, 2026
Comment on lines +44 to +46
private const MARIADB_TEXT_BYTES = '' . Database::MAX_TEXT_BYTES;
private const MARIADB_MEDIUMTEXT_BYTES = '' . Database::MAX_MEDIUMTEXT_BYTES;
private const MARIADB_LONGTEXT_BYTES = '' . Database::MAX_LONGTEXT_BYTES;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be a string?

case Database::VAR_LONGTEXT:
$validators[] = new Text($size, min: 0);
$validators[] = new ByteLength(Database::MAX_LONGTEXT_BYTES);
break;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants