AI 신뢰성 센터

14-2bHave you developed measures to monitor modifications in the data source?

• You can utilize methods like web crawlers to obtain training data for your AI model. Web crawlers have the advantage of quickly obtaining massive amounts of data through related open sources (e.g. Apache Nutch, Scrapy), but if the data source of the web page being crawled changes in real-time or if the target page itself is inaccessible, the distribution of the collected data may be disturbed, such as a lack of data of a certain class.

• Especially for AI systems that continuously learn from crawled data in real-time, changes in data sources can directly affect system performance. Thus, you must handle problems such as abnormal data sources or duplicate collection by monitoring the data collection process.

로고