• Data labeling is the annotation (ground truth labeling) of raw data to train AI models, and it is performed by various labelers. Preparing a detailed work instruction and training labelers are crucial as data labeling can directly affect the quality assurance of datasets and the performance of AI models.
• Depending on the data type, the data to be labeled, the scope, specific procedures, and the labeling tool can differ in data labeling. The following outlines the general labeling process, and there must be training for workers, as well as guidelines according to the work procedure.
✔ Acquisition and cleansing of data: Acquire raw data and cleanse the data.
✔ Arrangement of the target and range of labeling: Define the target and range of items to be labeled within the raw data. Specific standards, in particular, must be prepared for each data type (e.g. partial labeling of data, de-identification of personal data, definition and management of class).
✔ Establishment of labeling methods and procedures: Determine work methods (automated, semi-automated, or manual) according to the information needing to be labeled, and prepare detailed work standards, including work allocation and labeling standards by data.
✔ Labeling: Perform data labeling after training the worker based on detailed work (according to the pre-determined work method, select an appropriate labeling tool and conduct training in the case of automated or semi-automated work).