Forum | Topics | Posts | Last Posts |
---|
| |
| Collection of data science resources: blogs, websites, online books (please post links to books that are available online). Moderator: Moderators
| 4 | 13 | Blogs and websit... Sat Oct 31, 2015 8:44 pm Ellen Koenig |
| Collection of data science text books.
| 1 | 2 | Data Science Tex... Thu Oct 29, 2015 8:03 pm darkruby501 |
| Data Science Software resources. While it's important to code your own implementations to gain a deeper understanding of the algorithm, we are most likely going to use publicly available implementations that have been heavily optimized and tested. Please aggregate software resources you've found from github or other websites with a description. Please make a new post for each software resources so that potential users can ask questions.
| 0 | 0 | |
| There are thousands of papers in data science (a few dozen are submitted to arXiv daily) so lets try not to overload. Please only post papers that are particularly meaningful, provide good summaries of the field, you wish to discuss or are relevant to your project.
| 2 | 2 | 50 Years of Data... Sat Nov 07, 2015 2:47 pm Diran |
| Data sets are critically important, probably every single project we design in this course will include applying some data science algorithms to some data set. There are many publicly available data sets covering an enormous set topics. Please post links and information about public data sets so that we can use them to create cool projects. Additionally in your post or title highlight the size of the dataset so that people can quickly determine whether or not they computational resources to handle the dataset. I've made the decision to split data sets into three groups Small: 0 - 32 GBs (Fits in RAM) Mid: 32 GBs - 4 TBs (Fits in main memory) Large: 4+ TBs (Larger than main memory) On the assumption that most people will be using commodity hardware. With a typical computer with 32 GB RAM and 4 TB of main memory these sizes represent divisions at which practitioners will need to completely change the algorithmic approach to analysing their data. I recognize that some people may decide to use Amazon Web Services for their calculations or have access to commodity servers, high-end computers or small clusters. In that case these bounds aren't correct but the division still stands, data sets which cannot fit in RAM or main memory will have to be treated very differently. Please make a new post for each independent data set so that potential users can ask questions.
| 4 | 5 | ClueWeb09 Datase... Fri Oct 30, 2015 11:34 pm Admin |
| | Out of core data sets (32 GB - 20 TBs). Data sets larger than 32 GBs cannot fit in main memory, as a result we need special algorithms to deal with data sets. Please aggregate data sets of this size here. I'm assuming commodity hardware, if anyone has access to a server then these data set sizes are completely irrelevant.
| 0 | 0 | |
| | Data sets larger than 20 TB will probably need multiple computers and make use of message passing interfaces. If anyone find data of this size please post them here.
| 1 | 1 | ClueWeb09 Datase... Fri Oct 30, 2015 11:34 pm Admin |
| | This is to aggregate data sets that are small enough to fit in RAM: less than 32 GB in size.
| 3 | 4 | NLP Data Sets?... Thu Oct 29, 2015 12:18 am Admin |
| Miscellaneous topics, anything not covered above
| 2 | 21 | Introduce yourse... Wed Nov 04, 2015 4:44 pm justkeepswimming |
| | Hey everyone, I know we already posted our background in the top performance facebook groups, but that thread is very long. If you have time please introduce yourself and post a short summary of your interest in data science, what you've done and where you hope to go. We're all at different parts in the field, we might find that some people have already overcome the issues we now face or are confronted by similar problems.
| 1 | 18 | Introduce yourse... Wed Nov 04, 2015 4:44 pm justkeepswimming |
| | Any concerns, desired changes please post.
| 1 | 3 | Thread to share ... Wed Nov 04, 2015 10:18 am Yoori Choe |
| Coding, implementation problems, anything that's more software development than data science
| 2 | 3 | Nvidia's Introdu... Thu Oct 29, 2015 10:50 pm Yoori Choe |
| | GPU Programming is increasingly important and is critical for deep learning. Please post any tutorials, resources, comments, questions, about GPU programming here.
| 2 | 3 | Nvidia's Introdu... Thu Oct 29, 2015 10:50 pm Yoori Choe |