Launch of PyData Dublin at Workday

On Wednesday, 27th September 2017, PyData Dublin was launched! The inaugural event was hosted by Workday, who provided their panoramic sixth floor located in the heart of Smithfield, Dublin, along with a variety of refreshments (including pizza and beer) specially for the occasion. Members began to arrive and network from 18:00 until around 18:50, where after a brief introduction to PyData, our first speaker Prof. Barry Smyth began his talk.

Attendee badges awaiting collection in the reception of Workday. Organizers of PyData Dublin during setup for talks.

Running with Data: How Marathon Data can Help Runners to Achieve Faster Times.

Prof. Barry Smyth // Video * // Slides

* Note: Videos will be uploaded shortly.

Every year millions of people around the world run marathons. As race-day approaches, and training schedules begin to wind down, many participants will turn their attention to their race strategy. In this talk we will describe a machine learning approach to supporting runners in this process by helping them to select a suitable target finish-time and a race-plan to achieve it.

Barry Smyth describing how he used 'Scrapy', a Python scraping and web crawling framework, to collect the marathon data.

What better time to have a talk on using data for personal best prediction than with the Dublin Marathon just around the corner. During his talk, Barry describes how he first collected data from a large number of marathon websites using Scrapy, a Python scraping and web crawling framework. Then, using case based reasoning in combination with machine learning approaches (such as kNN, linear regression, random forests, and elastic nets), they were able to derive accurate predictions of personal best times across all levels of ability with less than 5% prediction error. Overall, the talk highlighted the huge potential that A.I. tools and techniques developed in other domains, such as recommender systems, have in the future of personalized and adaptive lifestyle assistants.

Barry Smyth presenting 'Running with Data: How Marathon Data can Help Runners to Achieve Faster Times.'

Topic Modelling with scikit-learn

Dr. Derek Greene // Video * // Slides and Code

* Note: Videos will be uploaded shortly.

Topic models aim to automatically discover the underlying thematic structure in a collection of documents. Popular approaches include Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorisation (NMF). In this talk I will describe how to use the Python scikit-learn package to preprocess a document collection, to apply both LDA and NMF to generate topic models, and then to explore the resulting models to get a better understanding of the contents of the collection.

Derek Greene presenting his talk 'Topic Modelling with Scikit-Learn'. Second image displays an example of a typical text preprocessing pipeline for a document corpus.

Dr. Derek Greene presented the second talk of the evening, where he walked through how easy it can be to efficiently perform text processing, representation, and topic modelling using Python’s scikit-learn package. In fact, he has provided all materials used for the talk as notebooks which can be found here. Interestingly, although the talk focused on applying topic modelling to documents, Derek noted that topic modelling has also been applied to non-textual datasets to. A recent example being Nathanael Aff’s: LEGO color themes as topic models. Finally, Derek touched on the practical issues associated with topic modelling, emphasizing how it’s best used as an exploratory tool to aid human interpretation and how domain knowledge is critical.

Derek Greene explaining the practical issues associated with topic modelling.

Workday - Venue, Food, and Drinks

A massive thanks again to Workday who provided the great venue, food, and drinks for the night! They’re currently looking for talent across a number of different technical roles ranging from software developers to data scientists. For more information check out their website here.

First image: Workday logo. Second image: Attendees of PyData Dublin enjoying the talks.

Want to Present or Host an Event?

Finally, if you’d like to present a talk or become a host for one of our events, please feel free to contact us through one of our many public forums (Twitter being the most active) or via email at:

We’re interested in a range of different talk types:

  • 5 minute lightning talks, from describing a library you’ve found useful, to things you wish you’d known when you first started, or even the top 5 packages that you couldn’t live without.
  • 15-20 minute talk with questions. Perhaps you’ve recently written a blog, project, or paper that relates to data analysis, management, processing, or visualization. Let us know and we can help you share it with the larger PyData community.
  • 40 minute tutorial with questions, not only would you like to share your work but you’re interested in teaching/providing examples of the implementation behind it to.

Tags:

Updated:

Leave a Comment