Hi,
I wanted to write code to detect the similarity between 2 HTML (really text) documents, and a Google search tells me that the typical way of doing this is to calculate the cosine similarity with Tf-Idf:
Classical approach from computational linguistics is to measure similarity based on the content overlap between documents. (http://text2vec.org/similarity.html)
Since I don't like to reinvent the wheel and I have very little time to learn this mathematical transformation, I wonder if anyone has written a VI for this? A Google search did not lead to any useful result so far...
Thanks much!