NLP A-to-Z: From data collection to a fully trained model

NLP A-to-Z: From data collection to a fully trained model

503 Lượt nghe
NLP A-to-Z: From data collection to a fully trained model
NLP A-to-Z: From data collection to a fully trained model Data is everywhere. It is used in every aspect of our life - health, insurance, finance, traveling, science, you name it. Collecting quality data in an efficient way is a challenging task — we need to define the right target in advance, aim at the right audience, and make sure the data quality is optimal. Good data can be used to train ML models which later serve the above applications. In these talks you will learn how to gather quality data starting from data scraping and data annotation and finishing with building the full model and its effect on results quality. Data scraping Itamar Abramovich, Director of Data Products, BrightData Fact: The internet is the largest database ever created. It is where our market, industry and the public’s reality happen by the second. To remain competitive and relevant every company, organization and business must tap into web data. Even the once most reluctant organizations have now turned to web data such as banks, finance services and more. In this session, Bright Data Director of Data Products, Itamar Abramovich will discuss and show real-life examples of why and how web data has made and is still making a huge difference in a company’s growth strategy. Bright Data is the industry-leading web data platform with over 15,000 customers and partners from across every industry. The company has made it its mission to deliver quality, reliable public web data with ease and simplicity. Join this expert presentation to learn from up-close how web data can help you solve some of your most critical challenges today. Off-the-shelf solutions will only get you so far Shay Hummel, Director of Knowledge Mining, SparkBeyond While knowledge is power, it’s often fragmented, disorganized, and inaccessible. SparkBeyond’s Knowledge Mining system parses the web’s wealth of unstructured data to deliver contextual answers to high-stakes problems. Our knowledge graph generator, Knomi, strings together multiple search engines with SOTA language models to produce structured responses. Off-the-shelf models produced impressive results yet for many applications the accuracy was not sufficient. We needed to adapt the models to our needs. We will describe how we incorporate real-life labeled data to add a bespoke model on top of the off-the-shelf models. We will discuss the importance of diverse datasets and how this work improved accuracy and customer satisfaction. End-to-end question answering on a handheld device for the benefit of people with reading difficulties Tal Rosenwein, VP of R&D, AI and Algorithms, OrCam Dyslexia affects 15-20% of the world's population; it is a language-based learning disability that results in difficulties with specific language skills, particularly reading. Dyslexic people usually experience difficulties with other language skills such as spelling, pronouncing words, and reading comprehension. In this talk, I will present a question-answering feature that helps to improve comprehension capabilities. Using a voiced-based interface, the user can query physical documents (e.g., books, newspapers) captured by the OrCam device, and the answer is played through the speakers. This feature incorporates models from multiple domains such as computer vision (CV), optical character recognition (OCR), automatic speech recognition (ASR), natural language processing (NLP), and text to speech (TTS). Get in touch with us Join our Slack community: https://toloka.ai/community Check out more of our events: https://toloka.ai/events Read our Medium: https://medium.com/toloka Follow us in social networks to make sure you won't miss any updates. Twitter: https://twitter.com/TolokaAI Facebook: https://facebook.com/globaltoloka Linkedin: https://linkedin.com/company/toloka/ 00:00:00 - Beginning 00:03:15 - Data scraping 00:30:20 - Off-the-shelf solutions will only get you so far 00:52:45 - End-to-end question answering on a handheld device for the benefit of people with reading difficulties 01:19:55 - Toloka with adaptive models