As most of you are aware the postponed 2020 summer Olympic Games in Tokyo took place this summer. What many of you might not be aware is that during that same time, four prospective data analysts waited for their first day at Sopra Steria. Much like this summer’s Olympics, which took place with empty stands... Continue Reading →
How long is an Eternity?
I once heard about an old Arabic definition of eternity. The definition was: “Imagine a pyramid in the desert, a thousand meters high, a thousand meters long and a thousand meters wide, made of pure diamond. Every thousand years, a small bird comes by and pecks at the pyramid [once]. When the pyramid is gone,... Continue Reading →
Plotly dash-app hosted in Heroku – How to make visualization dashingly more interesting
In my previous blog post (Are there any language detection tools for assigning language to music data?), I descibed my failed attempts att concatenating Artist Origin (or, to be more precise, the artists origin with respect to the language sung in) to a dataset created from Spotify's Web developer API. This information used to be available... Continue Reading →
Are there any language detection tools for assigning language to music data?
Music is a matter of taste and some of us have....how should I put it? different ideas of what is good music and what is trash that should never have seen the day of light. I am, since a few years back, a huge fan of Chinese Hip Hop and Rap (哈狗帮,龙井说唱 and 龍胆紫 )... Continue Reading →
A very merry Markov Christmas
In these times, the days leading up to Christmas and the holiday season, we thought it would be appropriate and interesting to use our set of tools in our analytical toolbox and take a closer look at one of the holidays core ingredients . We are going to take an introductory look (or listen?) at... Continue Reading →
Working with large csv-files in pandas? Create a SQL-database by reading files in chunks
It is not uncommon to have to deal with for instance csv-files containing millions of rows. Searching, filtering and slicing can therefore be time-consuming tasks. So, the question is then: Are there any ways to speed up the process? If so, this could save a considerable amount of time for any data scientist needing to... Continue Reading →
A Data Scientist’s take on Process Improvement
Two terms that rarely stand together: data analytics and process improvement. I started my career in project management and process improvement (Lean), and in my time as management consultant and data scientist, I have found myself multiple times in peculiar situations. The standard scenario is that our client is trying to solve problems on a... Continue Reading →
The Need for Intelligible Artificial Intelligence
Introduction A few years ago, I gave a talk at a healthcare conference organized by Computer Sweden on the importance of AI for the future of healthcare. If I remember correctly, I described a Breast Cancer Detection model I had constructed with the help of annotated data. Some people in the crowd were impressed while... Continue Reading →
Virus Spread Simulation Revisited – Population Attributes and Healthcare Resources
Introduction In my latest blog post (Simulating a Virus Spread – What you can do to help healthcare cope), I described the importance of social distancing in a pandemic in order to minimize the load on healthcare service, or as now is accepted as the concept of "flattening the curve". I choose to revisit this... Continue Reading →
Simulating a Virus Spread – What you can do help healthcare cope
Introduction - or why the net is flooded with the same types of descriptive statistics I've been pushing this moment for quite a while now. Yes, you know that moment when you feel that everybody (and I mean EVERYBODY) has written something about Covid-19 and you ask yourself if you really should partake in the... Continue Reading →