Often during migration, there are cases where you have to process unstructured and structured data that is loaded from files on a local file system. The data might also be in a character set that differs from the database character set.
These files hold the following types of data:
Metadata – This data describes the file structure.
Semi-structured data – These are textual strings in a specific format, such as JSON or XML. You might be able to make assertions about such data, such as "will always start with '<' " or "does not contain any newline characters."
Full text – This data usually contains all types of characters, including newline and quote characters. It might also consist of multibyte characters in UTF-8.
Binary data – This data might contain bytes or combinations of bytes including, nulls and end-of-file markers.