Top Performers in Data Science

Forum for Top Performance course users in Data Science
HomeHome  UsergroupsUsergroups  RegisterRegister  Log in  

Share | 

 NLP Data Sets?

Go down 


Posts : 2
Join date : 2015-10-28

PostSubject: NLP Data Sets?   Wed Oct 28, 2015 11:40 pm

Hey Theo & All,

I had a thought today that was interesting. Would you consider the PDFs from all Wikileaks cables an interesting data set? Consider for a minute that you could extract the text out of them all, I'm wondering if you could use them to do interesting natural language processing.

Do you think that could be interesting? I can't envision any practical use for it that I could state in a sentence, but to learn more about NLP it could be a good exercise. I guess it's a novelty thing, because you could do the same learning with a data set that was for instance "text from every book in 2015", but the wikileaks cables is just novel and fun sounding haha.
Back to top Go down

Posts : 14
Join date : 2015-10-27

PostSubject: Re: NLP Data Sets?   Thu Oct 29, 2015 12:18 am

There are a bunch of text data sets you can use, I'll try and dig up some alternatives for you this weekend. The difficulty with NLP is trying to figure what you're trying to do. Build a better generative model? Get lower perplexity for a given corpus? etc. I think with NLP the goal is actually pretty open. Interesting areas that are being pushed forward are memory neural networks for question answering. You can also do sentiment analysis or community detection with twitter data. The difficulty with the wiki-leaks data is that I can't think of much more to do outside just run latent dirchelet allocation. At the same time if you could post a link or explain how to get "text of all books" that would be great.
Back to top Go down
NLP Data Sets?
Back to top 
Page 1 of 1
 Similar topics
» Scid GUI
» Star Wars Episode VII: The Force Awakens sets
» SYM VTS 200 upgrade coverset to Evo250i
» LEGO Marvel 2014 sets

Permissions in this forum:You cannot reply to topics in this forum
Top Performers in Data Science :: Data Sets :: Small Data Sets-
Jump to: