NLP Demystified 6: TF-IDF and Simple Document Search

NLP Demystified 6: TF-IDF and Simple Document Search

11.523 Lượt nghe
NLP Demystified 6: TF-IDF and Simple Document Search
Course playlist: https://www.youtube.com/playlist?list=PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS We look at the problems of the previous bag-of-words approach, then use an improved technique (TF-IDF) to overcome them. In the demo, we'll use spaCy and scikit-learn to build TF-IDF vectors and build a simple document search engine. Colab notebook: https://colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystified_vectorization.ipynb#scrollTo=CnC_i4oH2ARW Timestamps: 00:00:00 TF-IDF 00:00:15 The problem with binary/frequency bag-of-words 00:01:03 Using relative frequency instead 00:01:50 Term Frequency (TF) 00:03:14 Inverse Document Frequency (IDF) 00:03:54 Getting a word's TF-IDF score 00:04:52 Variations of TF-IDF 00:05:49 DEMO: creating TF-IDF vectors with scikit-learn 00:08:41 DEMO: querying a corpus and ranking results 00:11:04 Benefits and shortcomings of TF-IDF This video is part of Natural Language Processing Demystified --a free, accessible course on NLP. Visit https://www.nlpdemystified.org/ to learn more.