Launch of PyData Dublin at Workday
On Wednesday, 27th September 2017, PyData Dublin was launched! The inaugural event was hosted by Workday, who provided their panoramic sixth floor located in the heart of Smithfield, Dublin, along with a variety of refreshments (including pizza and beer) specially for the occasion. Members began to arrive and network from 18:00 until around 18:50, where after a brief introduction to PyData, our first speaker Prof. Barry Smyth began his talk.
Running with Data: How Marathon Data can Help Runners to Achieve Faster Times.
Prof. Barry Smyth // Video * // Slides
* Note: Videos will be uploaded shortly.
Every year millions of people around the world run marathons. As race-day approaches, and training schedules begin to wind down, many participants will turn their attention to their race strategy. In this talk we will describe a machine learning approach to supporting runners in this process by helping them to select a suitable target finish-time and a race-plan to achieve it.
What better time to have a talk on using data for personal best prediction than with the Dublin Marathon just around the corner. During his talk, Barry describes how he first collected data from a large number of marathon websites using Scrapy, a Python scraping and web crawling framework. Then, using case based reasoning in combination with machine learning approaches (such as kNN, linear regression, random forests, and elastic nets), they were able to derive accurate predictions of personal best times across all levels of ability with less than 5% prediction error. Overall, the talk highlighted the huge potential that A.I. tools and techniques developed in other domains, such as recommender systems, have in the future of personalized and adaptive lifestyle assistants.
Topic Modelling with scikit-learn
Dr. Derek Greene // Video * // Slides and Code
* Note: Videos will be uploaded shortly.
Topic models aim to automatically discover the underlying thematic structure in a collection of documents. Popular approaches include Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorisation (NMF). In this talk I will describe how to use the Python scikit-learn package to preprocess a document collection, to apply both LDA and NMF to generate topic models, and then to explore the resulting models to get a better understanding of the contents of the collection.
Dr. Derek Greene presented the second talk of the evening, where he walked through how easy it can be to efficiently perform text processing, representation, and topic modelling using Python’s scikit-learn package. In fact, he has provided all materials used for the talk as notebooks which can be found here. Interestingly, although the talk focused on applying topic modelling to documents, Derek noted that topic modelling has also been applied to non-textual datasets to. A recent example being Nathanael Aff’s: LEGO color themes as topic models. Finally, Derek touched on the practical issues associated with topic modelling, emphasizing how it’s best used as an exploratory tool to aid human interpretation and how domain knowledge is critical.
Workday - Venue, Food, and Drinks
A massive thanks again to Workday who provided the great venue, food, and drinks for the night! They’re currently looking for talent across a number of different technical roles ranging from software developers to data scientists. For more information check out their website here.
Want to Present or Host an Event?
Finally, if you’d like to present a talk or become a host for one of our events, please feel free to contact us through one of our many public forums (Twitter being the most active) or via email at:
We’re interested in a range of different talk types:
- 5 minute lightning talks, from describing a library you’ve found useful, to things you wish you’d known when you first started, or even the top 5 packages that you couldn’t live without.
- 15-20 minute talk with questions. Perhaps you’ve recently written a blog, project, or paper that relates to data analysis, management, processing, or visualization. Let us know and we can help you share it with the larger PyData community.
- 40 minute tutorial with questions, not only would you like to share your work but you’re interested in teaching/providing examples of the implementation behind it to.
Leave a Comment