DATA CURATION AND ENHANCEMENT SAMPLES.
Image and PDF files processing.
Images are handled and improved carefully, like other data.
Statuses Entities Trademarks Classifications Numbers Dates Cross-fields
Images are sometimes overlooked in favor of other data. Yet, images require the same treatment as any other field.
| ✓ | Enhance the image by increasing its brightness/contrast; removing artifacts; resizing, rotating, and leveling it; decreasing noise; increasing the sharpness; etc. |
| ✓ | Clean and insert data in EXIF fields. |
| ✓ | Insert watermarks. |
| ✓ | Remove empty, invalid, or broken images. Identify 'not available' placeholder images. |
| ✓ | Migrate image files into specific formats. |
| ✓ | Split multi-page images and files into separated parts and vice-versa. |
| ✓ | Identify non-related images or images that include information that belongs to other fields. |
| ✓ | Isolate images within PDF other files. |
Brightness, contrast and exposure:
| Input: | Our output: |
|
|
Rotate and leveling:
| Input: | Our output: |
|
|
Artifacts, noise and sharpness:
| Input: | Our output: |
|
|
Resize and borders removal:
| Input: | Our output: |
|
|
Empty, broken and invalid:
| Input: | Our output: |
|
[Removed from dataset] |
Placeholders:
| Input: | Our output: |
|
[Removed from dataset] |
Multi-part:
| Input: | Our output: |
![]()
|
|
