We can begin to understand the point of this article by understand my reason to write it. As part of my responsibilities as a data analyst, I was tasked with finding the best Python data visualization library, for my team and other analytical groups. Our goal: to develop quickly and efficiently, utilizing a library that enabled customizations.
With a rein to select visualization libraries of interest, I began to research and document our needs and the requirements we had for libraries, from visualization capabilities, code complexity, and ease of customization, to versioning related issues.
After some introspection, my teammates and I established into the following criteria which we used to rate the libraries::
- Look Good — a library isn’t worth a dime if a manager doesn’t enjoy looking at it, or if the end-user can’t understand what you are presenting to them.
- Robustness — a robust library allows allows analysts and data scientists to perform complex analyses, with few limitations, and supports a key arsenal of graphs as an “out-of-the-box” capabilities.
- Complexity — visualization is a tool, not the goal of data analysis, so short, easy-to-read code is essential.
- Modularity — not every visualization is planned down to the smallest details, so the ease in which we can add and remove elements to our plots it key.
- Explorative — a library with good documentation and a solid community will allow the developer to explore the far reaches of the library and produce ad-hoc results, if needed.
Based on these needs and specifications, a visualization library’s test was born. One that I hope will be a standard evaluation for Python visualization libraries that every user who can benefit from, regardless of their experience level or functional group.
In the next section, we will review six of the most popular visualization libraries and how we ranked them.
The Visualization Library Test
Based on our criteria, we constructed a test that included just the three elements which encompassed all of our needs:
- Visualization capabilities.
- Package references and architecture.
To evaluate our test, we used a base ranking system, from None (the feature doesn’t exist), to Low-Medium, and High, and segmented visualization capabilities based on “must-have” and “nice-to-have” package built-ins.
- “must-have”: line, bar, scatter, histogram, box, heatmap.
- “Nice-to-have”: waterfalls, Pariplots, geo oriented, 3D.
- General adding, location setting and visual customizations of the Legend, Annotations and Marks/Symbols (also the three together — “Triple X”).
- Graph theming and templates: frames, gridlines, plot-sizes, margins and configurations.
Package references and architecture:
- Dynamic capabilities.
- Documentation quality and completeness.
- Community support.
- Visualization appearance.
A short disclaimer — the analysis could have been performed in many different ways and the supplementary research may have allowed others to form different conclusions, based on experience, review and necessity. However, I hope that sharing the structure of my review and results will enable others to perform similar reviews based on their own use-cases.
Results by Segment
Diverse “out-of-the-Box” graph types:
- Matplotlib (mid-high) — Matplotlib Had all the “must-have” capabilities, but not all the “nice-to-have” features we required on our projects. Overall, a very robust, with a clear development logic, which contributes to its high level of customizability but increases the level of complexity and time required to develop.
- Seaborn (mid-high) — Seaborn performed similarly to Matplotlib. While some of the “nice-to-have” capabilities were sadly missing, others are maintained by the package. A highly configurable package that require more expertise to develop out.
- On that essence, similar to Matplotlib, except for the edge cases, like those we looked for on the ‘Nice-to-Have’ graphs, which it lacked in some of them (Geo one for example). On the other hand, as Matplotlib, highly configurable, but also requires higher development skills and attention due to its relative complexity.
- Plotly (high) — Plotly had it all. the library included all the charts we put on our Wishlist, including our ‘nice-to-have’ features, rare find. Plotly’s syntax is also relatively simple, making it a package for users of all levels.
- Bokeh (mid) — We were sadly disappointed by Bokeh’s support for the capabilities we note as being essential to our work. We couldn’t find all the charts on our “must-have” list, and of those we did find, they were more difficult to work with. with that being said, Bokeh supports more complex analyses and plot types and of the “nice-to-have” visualizations, it performed quite well.
- Altair (mid-high) —Altair performed well, especially on the “must-have” abilities, but also on the “nice-to-have”. As a declarative package, it requires simple code to produce advanced visualizations and allows for decent support of more complex features.
- Plotnine (mid) — Plotnine performed below average. Another declarative package, the real failure was the documentation and community support. Using R’s syntax and ggplot2 documentation isn’t a clear requirement when working with the package, has all the “must-have” abilities but not all the “nice-to-have” ones. The code is readable, but it may just be easier to work in R and benefit from the tidy verse.
- Below average performance. Had all the “Must-Have” ones, but few of the nice to have ones. Fairly readable code, but felt limited due to lack of community and solid documentation.
- Matplotlib (high) — Again, Matplotlib was a real standout, it is a robust and agile library that enables you to create whatever you need. One downside though, we found it had a limited support for “Triple X” (mark, color and size) on the scatter plots.
- Seaborn (high) — On that note, if Matplotlib performed well, then Seaborn has proven to be even better. it took most, if not all, Matplotlib had to offer and improved it, but then again, it was designed as a wrapper. And yes, it does support “Triple X”.
- Plotly (mid) — Plotly was a little low on the rating scale in this category. Some of the features were just average, for example its ability to use marks, the rest were limited. I got the impression that support for customization is limited and requires excessive coding to configure, which begs the question, is the investment worth the reward? With that said, the library is modular and has easy-to-read code which can produce in a few lines what other packages require many for.
- Bokeh (Mid-High) — Here Bokeh performed well, it allowed us to do what we required for the test, but we found that sometimes the price was extra (or excess) lines of code and online-research.
- Altair (mid-high) —Altair shows some great capabilities, with similarities to Bokeh. the library is robust and fairly easy to configurate as far as complexity goes. Perhaps this is due to their well-maintained documentation. Side note, great modularity.
- Plotnine (mid) — Again, Plotnine failed to perform, it was capable, but not much more, and misses some of the strengths and charm of the ggplot2 package for R. Due to the lack of documentation for Plotnine, it wasn’t easy to test or work with.
- Matplotlib (mid) — it’s not news, but it’s still worth mentioning that Matplotlib lacked on one of the most important things: visualization appearance. In addition, the library isn’t dynamic. With that being said, as the oldest and most commonly used library, it has inherent strengths among them a large user community and great documentation and guides exist for the package making it easy to learn and explore.
- Seaborn (high) —Here, Seaborn had good performance, its only downfall being that it isn’t dynamic, and its runtime was a little longer than average.
- Plotly (high) —Plotly had a solid performance on most aspects, and its dynamic abilities were a bonus, in addition to being well documented, the community has good support for most of the issues you will face.
- Bokeh (mid-low) — Here Bokeh trailed behind the competition compared to most libraries, the package requires a significant amount of imports, which could lead to excess lines of codes. Also, its documentation is not as robust as some of the other libraries, but it does have a solid dynamic mode, and a great visual appeal.
- Altair (high) — Similar to Plotly, Altair performed decently. with a surprisingly informative appealing documentation, and beautiful plots. With that being said, it is a relatively young library, and its community is still limited, although it appears to offer support readily.
- Plotnine (mid-low) — The documentation for Plotnine was the major drawback for using the library, in our eyes, it was not detailed enough, and searching for ggplot2 documentation wasn’t listed as a requirement. aside from that, low Plotnine adoption by Python users leads to poor community support.
Disclaimer — visualization appearance is subjective, these ranks were assigned based on our need and opinions, and different ranks durin evaluation can lead to different conclusions.
Last Thing Before We Wrap Things Around
Since we have been talking on visualization libraries, it would be a shame not to show some of the beautiful produced by these libraries.
Needless to say, we believe that an aesthetic appearance should require minimum configuration and be the default when possible, and we reviewed the libraries with this in mind.
Our top picks are Matplotlib, Seaborn, Altair and Plotly. While the decision to use one will depend on the spectrum of your uses for their capabilities, among these choices, you could just as easily flip a coin, because all of them provide strong support and appeal for most project.
Also, feel free to test these libraries, or possibly others, by the test’s metrics, or perhaps for a specific use case. These could be from handling geo-oriented data, or possibly 3D based visualizations.
If you do, we hope you will share your results and that the analytics community will be able to learn from your conclusions and research. Together we can improve our data-visualization capabilities, empower decision makers and create a foundation for library selection for future projects.
If you do, please share it and allow learning from your conclusions and research. I’m certain that together we could deliver our data better, and allow greater decision making.
As a final note, Bokeh is a wonderful library with great potential and we look forward to following its growth. as of now, for our needs, it didn’t perform very well on our test, especially on graph diversity, documentation and community requirements. if Bokeh resolves these issues moving forward, it will definitely shine.