Monday, November 7, 2016

Something For Your Mind, Polymath Podcast episode 3

"Improving your communications and Entrepreneurship"

In this episode, , we will cover these two different topics. Specifically, we will talk about kinematic displays, improving your communications, coworking spaces, entrepreneurship, innovation and more. This episode will wrap up with a learn more section, on what is an entrepreneur.
"these are people that can't help but generate ideas all the time..."

A few selected links are on the episode web page, and includes a few items related to colors (some for all platforms, others are Linux only)

Something for your mind is available on


Francois Dion

Thursday, October 20, 2016

Stemgraphic, a new visualization tool

PyData Carolinas 2016

At PyData Carolinas 2016 I presented the talk Stemgraphic: A Stem-and-Leaf Plot for the Age of Big Data.


The stem-and-leaf plot is one of the most powerful tools not found in a data scientist or statistician’s toolbox. If we go back in time thirty some years we find the exact opposite. What happened to the stem-and-leaf plot? Finding the answer led me to design and implement an improved graphical version of the stem-and-leaf plot, as a python package. As a companion to the talk, a printed research paper was provided to the audience (a PDF is now available through

The talk

Thanks to the organizers of PyData Carolinas, videos of all the talks and tutorials have been posted on youtube. In just 30 minutes, this is a great way to learn more about stemgraphic and the history of the stem-and-leaf plot for EDA work. This updated version does include the animated intro sequence, but unfortunately the sound was recorded from the microphone, and not the mixer. You can see the intro sequence in higher audio and video quality on the main page of the website below.

I've created a web site for stemgraphic, as I'll be posting some tutorials and demo some of the more advanced features, particularly as to how stemgraphic can be used in a data science pipeline, as a data wrangling tool, as an intermediary to big data on HDFS, as a visual validation for building models and as a superior distribution plot, particularly when faced with non uniform distributions or distributions showing a high degree of skewness (long tails).

Github Repo

Francois Dion

Tuesday, October 11, 2016

PyData Carolinas 2016 Tutorial: Datascience on the web

PyData Carolinas 2016

Don Jennings and I presented a tutorial at PyData Carolinas 2016: Datascience on the web.

The plan was as follow:


Learn to deploy your research as a web application. You have been using Jupyter and Python to do some interesting research, build models, visualize results. In this tutorial, you’ll learn how to easily go from a notebook to a Flask web application which you can share.


Jupyter is a great notebook environment for Python based data science and exploratory data analysis. You can share the notebooks via a github repository, as html or even on the web using something like JupyterHub. How can we turn the work we have done in the notebook into a real web application?
In this tutorial, you will learn to structure your notebook for web deployment, how to create a skeleton Flask application, add a model and add a visualization. While you have some experience with Jupyter and Python, you do not have any previous web application experience.
Bring your laptop and you will be able to do all of these hands-on things:
  1. get to the virtual environment
  2. review the Jupyter notebook
  3. refactor for reuse
  4. create a basic Flask application
  5. bring in the model
  6. add the visualization
  7. profit!
Now that is has been presented, the artifacts are a github repo and a youtube video.

Github Repo

After the fact

The unrefactored notebook is here while the refactored one is here.
Once you run through the whole refactored notebook, you will have train and test sets saved in data/ and a trained model in trained_models/. To make these available in the tutorial directory, you will have to run the script. On a unix like environment (mac, linux etc):
chmod a+x


The whole session is now on youtube: Francois Dion & Don Jennings Datascience on the web

Francois Dion

Thursday, October 6, 2016

Improving your communications: Professional Audio-Video Production on Linux

Pro AV on Linux

I'll be presenting on the subject of Professional Audio-Video Production on Linux, next week at TriLug.

From concept to finished product, it has never been easier to obtain professional results when it comes to audio-video production on Linux.

We will cover some of the hardware that should be part of your production suite, from microphones to jog wheels and highlight some of the top tools for animation, audio, broadcasting, effects, modeling, music, transcoding and video. We will also go beyond the usual suspects and introduce some tools that might not be typically used for AV production.
By the end of the presentation, you will have all the tools you need to improve the quality of your communications, for your personal enjoyment, your career, or your business.


Thursday, 13 October 2016 - 7:00pm to 9:00pm
The Frontier, 800 Park Offices Drive, Durham, NC
Francois Dion

Wednesday, October 5, 2016

Something For Your Mind, Polymath Podcast episode 2

A is for Anomaly

In this episode, "A is for Anomaly", our first of the alphabetical episodes, we cover financial fraud, the Roman quaestores, outliers, PDFs and EKGs. Bleep... Bleep... Bleep...
"so perhaps this is not the ideal way of keeping track of 15 individuals..."

Something for your mind is available on



Francois Dion
P.S. There is a bit more detail on this podcast as a whole, on linkedin.

Friday, September 30, 2016

5 music things

5 in 5

I like to cover 5 things in 5 minutes for lightning talks. Or one thing. At the local
Python user group, sometimes questions or other circumstances turn these 5
in 5 more into a 5 in 10-15...

5 Music Things

Eventually, after a year or two, I'll revisit a subject. I recently noticed that I had
not talked about music related things in almost two and a half years, so I did
5 quick Jupyter notebooks and presented that. Interestingly enough, none of
these 5 things were covered back then. The github repo includes edited versions
of the notebooks, based on the interactions at the meeting during my presentation.
Requirements: All require the following
pip install jupyter

1 - Audio

2 - libROSA

Here we will need to pip install matplotlib and numpy, and of course librosa.

3 - music21

pip install music21
You'll need some external programs: Lilypond and Musescore
You also need launch scripts for each of them. On a mac, use the provided
launch scripts in the mac/ folder of this repo. Make sure you chmod a+x them.
Change the path in the notebook to reflect your own user path.

4 - python-sonic

pip install python-sonic
You'll need one external program: Sonic Pi and to start it before running through
the notebook.

5 - pyKnon

pip install pyknon
You'll need one external program: timidity

easily installed:

  • in Linux with apt-get install timidity
  • on a Mac with brew install timidity
This was mostly an excuse to demo that external command line tools like timidity
or sox can be used here.

Have fun!
@f_dion - francois(dot)dion(at)gmail(dot)com

P.S.: Github repo at: but for some strange reason, github will not render the first (0-StartHere) notebook. This blog post is basically that notebook, putting things in context.

Sunday, September 25, 2016

Something for your mind: Polymath Podcast Episode 001

Two topics will be covered:

Chipmusic, limitations and creativity

Numfocus (Open code = better science)

The numfocus interview was recorded at PyData Carolinas 2016. There will be a future episode covering the keynotes, tutorials, talks and lightning talks later this year. This interview was really more about open source and less about PyData.

The episode concludes with Learn more, on Claude Shannon and Harry Nyquist.

Something for your mind is available on


Francois Dion

Sunday, September 18, 2016

Something for your mind: Polymath Podcast launched

Some episodes
will have more Art content, some will have more Business content, some will have more Science content, and some will be a nice blend of different things. But for sure, the show will live up to its name and provide you with “something for your mind”. It might raise more questions than it answers, and that is fine too.

Episode 000
Listen to Something for your mind on

Francois Dion

Monday, April 18, 2016

Los Alamos 10742: The Making of

Modern rendering of the original 1947 Memo 10742

Before reading

If you've not read the first part (The return of the Los Alamos Memo 10742) of this blog, go there now. There will be a link to come back here at the end, so you don't forget ...

Your assignment

If you remember, in the previous article, I had asked the students (and, you, the reader) to try this exercise:

"Replicate either:
a) the whole memo
b) the list of numbers 
Whichever assignment you choose, the numbers must be generated programmatically."

One possible way

We'll use Python 3 and do b):

In [1]:
def num_to_words(n):
    """Returns a number in words, covering 0 to 100 inclusive."""
    n2w = {
        0: 'zero', 1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five', 6: 'six',
        7: 'seven', 8: 'eight', 9: 'nine', 10: 'ten', 11: 'eleven', 12: 'a dozen',
        13: 'thirteen', 14: 'fourteen', 15: 'fifteen', 16: 'sixteen', 17: 'seventeen',
        18: 'eighteen', 19: 'nineteen', 
        20: 'twenty', 30: 'thirty', 40: 'fourty', 50: 'fifty', 60: 'sixty', 70: 'seventy',
        80: 'eighty', 90: 'ninety', 100: 'one hundred'
        return n2w[n]
    except KeyError:
        return n2w[n-n%10] + ' ' + n2w[n%10]
The famous twelve as 'a dozen'
In [2]:
'a dozen'
In [3]:
In [4]:
'sixty seven'
In [5]:
'one hundred'
Generating the alphabetical word list, not including number 10
In [6]:
word_tuples = sorted([(num_to_words(num),num) for num in range(101) if num != 10])
Now that the list is sorted alphabetically, just want the second item of each tuple [1]
In [7]:
result = list(zip(*word_tuples))[1]
Let's print this.
In [8]:
12, 8, 18, 80, 88, 85, 84, 89, 81, 87, 86, 83, 82, 11, 15, 50, 58, 55, 54, 59, 51, 57, 56, 53, 52, 5, 4, 14, 40, 48, 45, 44, 49, 41, 47, 46, 43, 42, 9, 19, 90, 98, 95, 94, 99, 91, 97, 96, 93, 92, 1, 100, 7, 17, 70, 78, 75, 74, 79, 71, 77, 76, 73, 72, 6, 16, 60, 68, 65, 64, 69, 61, 67, 66, 63, 62, 13, 30, 38, 35, 34, 39, 31, 37, 36, 33, 32, 3, 20, 28, 25, 24, 29, 21, 27, 26, 23, 22, 2, 0
In [ ]:

If you read the commentaries for the previous article on the subject, you surely ran into Edward Carney's almost working proposed solution. I am adding it here as another way of attacking the problem. Edward used a module named num2words. As you'll discover over years of writing python code, most anything you can think of has already been done. And in some cases, multiple times.

Why did I say almost working? Let's see if somebody finds the issue. If not I'll post the correction in a future post (the very next one will diverge from this subject to talk about fractals). I'll also introduce the inflect module and since we're introducing some NLP concepts, I'll bring in NLTK too.

In [1]:
import num2words as n2w
In [2]:
key_set = []
[key_set.append(n2w.num2words(i)) for i in list(range(101))]
key_set[12] = 'dozen'
key_set[100] = 'one hundred'
numset_dict = dict(zip(key_set,list(range(101))))
line_breaks = [14, 30, 46, 62, 78, 94]
for i, k in enumerate(yvals):
    print('{} '.format(k[1]),end='')
    if i in line_breaks:
NameError                                 Traceback (most recent call last)
<ipython-input-2-6c7998a49267> in <module>()
      5 numset_dict = dict(zip(key_set,list(range(101))))
      6 line_breaks = [14, 30, 46, 62, 78, 94]
----> 7 for i, k in enumerate(yvals):
      8     print('{} '.format(k[1]),end='')
      9     if i in line_breaks:

NameError: name 'yvals' is not defined

You know the solution? Post it in the comments section.

Francois Dion

Monday, March 14, 2016

The date is the title...

J. Venn - Logic of Chance

Turtle Graphics?

The above, looks suspiciously like a printout from my first session with Apple Logo (the language, not the branding), before I figured the command for "pen up"...

A few months back, I was reading a few books and found the above in one of them. It is titled "Logic of Chance", by John Venn (mostly known for the Venn diagram). The year? 1866.

So, where were we? Ah yes...


Yes, that famous sequence of number. What was the story with John Venn and pi, here? Whereas I used digits 0-9 in "the 10 colors of pi", John used digits 0-7, discarding all 8s and 9s. Since back then there were no computers, he picked his numbers from a book (by R. Shank) which had 707 digits of pi, leaving him with 568 digits between 0 and 7. He mapped 0 to 7 to directions (10 directions might have felt a bit odd, at 36 degrees, versus nice 45 degree lines):

Although he doesn't specify the mapping, it is easy to infer from the graph. The first digit after the decimal is 1, then 4 and we can see the path as NE, then S, so:

0 N
1 NE
2 E
3 SE
4 S
5 SW
6 W
7 NW

The random walk

He would then move by 1 unit in the direction of each digit / direction mapping. NE, S, NE, SW, skip 9, E, so on and so forth. (NB: This is easy to reproduce in python with the turtle module. A quick search of my blog will get you started on this, from a pi generator to import turtle.)

His conclusion stated: 
"The result seems to me to furnish a very fair graphical indication of randomness". 

Francois Dion

Saturday, March 5, 2016

The return of the Los Alamos Memo 10742 -

Modern rendering of the original 1947 Memo 10742

The mathematician prankster

Can you imagine yourself receiving this memo in your inbox in Washington in 1947? There's a certain artistic je ne sais quoi in this memo...

This prank was made by J Carson Mark and Stan Ulam.  A&S was Administration and Services.

And Ulam, well known for working on the Manhattan project, also worked on really interesting things in mathematics. Specifically, a collaboration with Nicholas Constantine Metropolis and John Von Neumann. You might know this as the Monte Carlo method (so named due to Ulam's uncle always asking for money to go and gamble in a Monte Carlo casino...). Some people have learned about a specific Monte Carlo simulation (the first) known as Buffon's needle.

Copying the prankster

When I stumbled upon this many years ago, I decided that it would make a fantastic programming challenge for a workshop and/or class. I first tried it in a Java class, but people didn't get quite into it. Many years later I redid it as part of a weekly Python class I was teaching at a previous employer.

The document is the output of a Python script. In order to make the memo look like it came from the era, I photocopied it. It still didn't look quite right, so I then scanned that into Gimp, bumped the Red and Blue in the color balance tool to give it that stencil / mimeograph / ditto look.

Your assignment

Here is what I asked the students:

"Replicate either:
a) the whole memo
b) the list of numbers 
Whichever assignment you choose, the numbers must be generated programmatically."

That was basically it. So, go ahead and try it. In Python. Or in R, or whatever you fancy and post a solution as a comment.

We will come back in some days (so everybody gets a chance to try it) and present some possible methods of doing this. Oh, and why the title of "the return of the Los Alamos Memo"? Well, I noticed I had blogged about it before some years back, but never detailed it...

Learning more on Stan Ulam

See the wikipedia entry and also:


[EDIT: Part 2 is at: los-alamos-10742-making-of.html]

Francois Dion

Monday, January 4, 2016

Stack overflow en espanol

En caso que no lo ha encontrado, el sitio stack overflow ahora es disponible en español. Y, no todas las respuestas son las mismas que la del stack overflow en ingles. Hay una buena cantidad de contenido exclusivo.


Por ejemplo, alguien pregunto: Cómo instalar MySQLdb en OS X?

Hay varias respuestas, pero yo se que la mía es algo que yo he escrito solamente en español:

Mysql-python solo es compatible con python 2 (Python3 WOS), y el pip es de python 3:
$ which pip
Muy probablemente devolverá algo similar a:
Para hacer la instalación bajo python 2, hay que seleccionar el pip de python 2:
$ sudo pip2 install MySQL-python
La otra opción es un módulo puro python que es compatible python 2 y 3, como pymysql.
Al final, para evitar los conflictos de versiones y también los python de Apple (con varios problemas) es mejor hacer la instalación de python 2.x y 3.x con homebrew, y utilizar virtualenv que permite la creación de entornos virtuales python, cada cual con solo los requisitos para el entorno. Sin entornos virtuales hay que siempre ser explicito: pip2 o pip3 en vez de pip.

Francois Dion