Have you ever wondered how Wikipedia or Google Search answers our questions? Random thoughts hit us at random times, and we turn to those search engines and online encyclopedias. Either way, we get our answers. Most.
Knowledge-intensive natural language processing (KI-NLP) is how Google or Wikipedia search retrieves the answers to our questions. The AI models they contain dig through an archive of information to give us relevant search results. However, there are several limitations to the current KI-NLP landscape.
KI-NLP architectures depend on black box search engines to search for information on the web of knowledge. In the process, relevant information may be missed because search engine algorithms may rank it too low in the results. Also, in the case of Wikipedia searches, the online encyclopedia often does not capture all the knowledge available on the web related to a particular topic, and with its continued growth, it has become difficult to check citations and the like. bias.
Meta’s Sphere
Meta came with the first white-box search solution, Sphere, which uses open web data as a source of knowledge. Meta believes that Sphere’s white box knowledge base contains significantly more data and sources to match for verification than a typical black box knowledge source. So it can provide useful information that they cannot.
The idea was to create smarter AI systems that could better leverage real-world knowledge. Sphere has passed the benchmark of knowledge-intensive language tasks, implying that it can help AI researchers build models that can leverage real-world knowledge to accomplish multiple tasks.
Sphere represents the effort of Meta to allow AI researchers to experiment with building KI-NLP models. Meta believes Sphere will help researchers train retrievers to handle a wider range of documents and prepare automatic systems to deal with issues such as misinformation and inconsistent text. The models thus created could help in the real world to fight against harmful content. It also has the potential to improve digital literacy and critical thinking skills.
How Meta seeks to challenge Google with Sphere
Shortly after Meta released Sphere, talk started swirling around Meta seeking to challenge Google.
“MetaAI introduces white box research. In open-sourcer Sphere, its corpus at the scale of the web. Challenge Google directly,” posted Prithivi DamodaranML consultant at Donkey Stereotype.
With Sphere, Meta tries to solve the problem of the most relevant source related to the request or the subject of the Internet user. In an age where search engine optimization is widely used to easily rank information resources, what appears higher in search results may not be the most relevant source for the internet user. In fact, Google Search is known for its search results. There have been many complaints from users about wrong and irrelevant Google search results. Often the initial results are for advertisements or information that is not even remotely related to the query. Sphere seeks to solve this problem.
Another way Meta tries to outdo Google with Sphere is to open it up. Big tech companies like Google have often been criticized for their opacity in their ML research. They give no information on how these models were created or what data was applied, which led to the AI replication crisis.
Replication crisis is indeed a big problem as it can lead to several other problems. If an AI research team does not release any information regarding their AI model, the wider community does not know if they are using a biased dataset to train the model. Suddenly it produces biased results when introduced to the real world. Take the case of Google Vision Cloud, which labeled an image of a dark-skinned individual holding a thermometer as a “pistol” while a similar image with a light-skinned individual was labeled as an “electronic device”. .
Future trajectory
Whether Sphere unfolds as Meta wishes, is a matter of time. However, Meta’s work on a web-scale corpus like Sphere shows the potential that harnessing the vast textual resources available online today through white-box retrieval could be NLP’s next big breakthrough. .
Nevertheless, problems exist. One of the main issues that Meta intends to address is the quality of information retrieved. NLP models should be able to assess the quality of retrieved documents, manage duplicates, detect possible misstatements and contradictions, prioritize more reliable sources, and refrain from providing the answer if no sufficiently good evidence exists in the corpus.