NLP Demystified 6: TF-IDF and Simple Document Search

11.523 Lượt nghe

00:00

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.

Tải MP3

MÔ TẢ MP3TIẾP THEO

NLP Demystified 6: TF-IDF and Simple Document Search

Course playlist: https://www.youtube.com/playlist?list=PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS

We look at the problems of the previous bag-of-words approach, then use an improved technique (TF-IDF) to overcome them. In the demo, we'll use spaCy and scikit-learn to build TF-IDF vectors and build a simple document search engine.

Colab notebook: https://colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystified_vectorization.ipynb#scrollTo=CnC_i4oH2ARW

Timestamps:
00:00:00 TF-IDF
00:00:15 The problem with binary/frequency bag-of-words
00:01:03 Using relative frequency instead
00:01:50 Term Frequency (TF)
00:03:14 Inverse Document Frequency (IDF)
00:03:54 Getting a word's TF-IDF score
00:04:52 Variations of TF-IDF
00:05:49 DEMO: creating TF-IDF vectors with scikit-learn
00:08:41 DEMO: querying a corpus and ranking results
00:11:04 Benefits and shortcomings of TF-IDF

This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.

Visit https://www.nlpdemystified.org/ to learn more.					

NLP Demystified 6: TF-IDF and Simple Document Search

Nhạc Theo Chủ Đề

Liên kết website