Blog

Web Data Discovery and Sourcing Approaches for Text Analytics

The web data discovery and sourcing problem is multifold: ranging right from intellectual property ownership/control, volume, ethics, precision and authenticity. A knowledgebase should at the least adhere to select resources which assure a degree of credibility, authenticity, repeatability, reliability and quality. Just because something is there on the web doesn’t necessarily mean it’s easy to find. Most of the time, we know what we want but don’t know where we can find it and how we can use it. Read more...

Duplicate Data Resolution Techniques

Attack the problem at the source by preventing duplication at data entry. Manual data entry is perhaps the entry point most culpable for data duplication. Guidelines, standardized templates and a strong review system as well as an alert informing data operators of possible duplicates existing in the system when a new entry is added will ameliorate some of the pain. Naturally, the next step is to provide a tool to check existing records for duplicate data. In the rest of this article, I will analyze the problems with solutions currently in use and propose my own improvements on these. Read more...