Top Performers in Data Science
Would you like to react to this message? Create an account in a few clicks or log in to continue.
Top Performers in Data Science

Forum for Top Performance course users in Data Science
 
HomeHome  Latest imagesLatest images  RegisterRegister  Log in  

 

 NLP Data Sets?

Go down 
2 posters
AuthorMessage
dain

dain


Posts : 2
Join date : 2015-10-28

NLP Data Sets? Empty
PostSubject: NLP Data Sets?   NLP Data Sets? EmptyWed Oct 28, 2015 11:40 pm

Hey Theo & All,

I had a thought today that was interesting. Would you consider the PDFs from all Wikileaks cables an interesting data set? Consider for a minute that you could extract the text out of them all, I'm wondering if you could use them to do interesting natural language processing.

Do you think that could be interesting? I can't envision any practical use for it that I could state in a sentence, but to learn more about NLP it could be a good exercise. I guess it's a novelty thing, because you could do the same learning with a data set that was for instance "text from every book in 2015", but the wikileaks cables is just novel and fun sounding haha.
Back to top Go down
Admin
Admin



Posts : 14
Join date : 2015-10-27

NLP Data Sets? Empty
PostSubject: Re: NLP Data Sets?   NLP Data Sets? EmptyThu Oct 29, 2015 12:18 am

There are a bunch of text data sets you can use, I'll try and dig up some alternatives for you this weekend. The difficulty with NLP is trying to figure what you're trying to do. Build a better generative model? Get lower perplexity for a given corpus? etc. I think with NLP the goal is actually pretty open. Interesting areas that are being pushed forward are memory neural networks for question answering. You can also do sentiment analysis or community detection with twitter data. The difficulty with the wiki-leaks data is that I can't think of much more to do outside just run latent dirchelet allocation. At the same time if you could post a link or explain how to get "text of all books" that would be great.
Back to top Go down
https://tpdatascience.board-directory.net
 
NLP Data Sets?
Back to top 
Page 1 of 1

Permissions in this forum:You cannot reply to topics in this forum
Top Performers in Data Science :: Data Sets :: Small Data Sets-
Jump to: