How do you decide which Python data visualization library to use?
10 questions to ask yourself before you choose a python data visualisation library.
This article is for amateurs who have little or some experience with data visualization, but a solid understanding of Python and data manipulation tools like NumPy, Xarray, and Pandas. You may use this as a guide to speed up your learning, get introduced to some libraries you might not have heard of, or get a quick overview of the fundamentals of data visualisation.
Data visualization is an essential aspect of data analysis, and it involves representing data in a graphical form. It is a powerful tool that can help people understand complex data and communicate insights more effectively and engagingly. Visualizations help identify patterns, trends, and relationships within data that may be difficult to see with raw data.
Python offers several libraries with plenty of functionalities, capacities, and customization options. Authors of data visualization libraries can design them for specific purposes or audiences, where they may also vary in their capabilities, maintenance, support community, and how much they are customizable. Some focus on particular data types. Some function as high-level interfaces built on top of others. They allow developers and data analysts to create interactive and informative visualizations. However, choosing the right data visualization tool can be challenging.
This article will provide a list of libraries and a set of questions to consider before selecting among them.
The libraries
To begin with, here is a non-exhaustive list of some popular python data visualization libraries:
Matplotlib.pyplot, bokeh,Plotly, altair, D3.js, HoloViz, mpld3 , folium ,pyechart, highchart, chartify, vincent, vega, PyCairo, Lightning, Bqplot, Mayavi, VisPy, Gleam, Chaco, Graphviz, NetworkX, Geoplotlib, Pygal, ggplots2, seaborn Cartopy, DNA Feature Viewer, Plotnine, gammapy, astropy , Dash, toyplot, cufflinks, ipyvolume, ipyleaflet, glumpy, visvis, GR Framework, d3po, glueviz, scikit-plot, yellowbrick, ggpy, basemap, YT, graph-tool, Vaex mpld3, pythreejs, PyQTGraph, Missingno, Leather, OpenCV, PIL, scikit-image, simpleitk, visualkeras, conx, PLotNeuralNet,Poplar, tensorspace, NN-SVG, MXNet, PyToarchViz, Netron, Tensorboard, Blender.
You can find most of these in the python data visualization ecosystem; PyViz.
The Questions
With the following questions, we can better select a libary on which we will devote our time and resources. We will address lisences, data, usecase, implementation, capacities such as interactivity, animation, custumization and rendering output.
1. What is your use case?
Your specific use case should be a key consideration when selecting a data visualization library.
Some libraries are designed for scientific or academic use, while others are for business or marketing applications. Also, some libraries are specifically designed for particular data types, while others are more general-purpose. In addition to considering the data type you are working with, you should also consider whether you need to process the data at the same time simultaneously. Some libraries have visualization as an extension on top of their central functions. For example, some libraries are designed to process data with graph structure but also provide visualization capabilities on top, or those designed to process neural networks but also provide visualization of the various stages of the process. It’s also important to note the difference between Scientific Visualization (SciVis) and Information Visualization (InfoVis) libraries. SciVis libraries usually exist to visualize complex scientific data, such as 3D data, astronomical data, or simulation results. On the other hand, InfoVis libraries are for visualizing abstract or non-geometric data that is more accessible to a general audience, such as network graphs or financial data.
Understanding your specific use case can help you choose the proper library.
2. Do you know your data?
When working with data visualization tools, it is crucial to understand the data you are working with, as the visuals you create are dependent on it. There are several aspects of data to consider when choosing a visualization library that suits your needs.
Sources: Where is your data?
Data visualization libraries enable users to connect to various data sources, such as local files, databases, web services, web scraping (using BeautifulSoup & Scrapy), or real-time data streams to extract data in a structured format. Different libraries may have distinct capabilities for accessing and processing data from multiple sources and formats. For example, Pandas, Matplotlib, and Seaborn can connect to SQL databases and execute queries to retrieve data. In contrast, Plotly and Bokeh can connect to NoSQL databases and RESTful APIs. BeautifulSoup and Scrapy can scrape data from websites and store it in a structured format for visualization using data visualization libraries. Some libraries, like Plotly, can connect to online data services such as Google Sheets, Dropbox, and Microsoft Excel Online. While it is not impossible to achieve the same results with different libraries, some may require more effort from you.
Size: How big is your data?
Here is the answer to an age-old question; size matters, at least, for data visualization libraries. The size of your data matters as one of the most significant differences between these libraries is their limit, such as the time it takes to render and visualize all your data. Different libraries can handle various data sizes. Some can only display a few thousand data points within in reasonable time. Others can process data size with an order of magnitudes more data points by offering optimized rendering options, like server-side rendering using Datashader or Vaex.
Type and Format: What type of data do you want to visualize, how is it formatted?
Data visualization libraries can work with many data types, such as graphs, neural networks, geospatial, time-series, financial data, virtual environment data, DNA string, voxels, 3D terrains, waveforms, spectrograms, musical scores, meteorological data, etc. These data may be present in structured and unstructured data formats. Structured data formats, such as CSV, XML, and JSON, organize the data with defined storage, labeling, and separation rules. In contrast, unstructured data formats such as plain text, audio, video, and images do not have a predefined structure or syntax, making them more challenging for computers to read and process. For example, Matplotlib is a popular data visualization library for creating visualizations and plots from structured data. However, it’s not optimized for processing or analyzing unstructured data, although it can display some types of unstructured data. Other libraries like OpenCV, PIL, or Scikit-image may be more appropriate for most use cases based on unstructured data. Choose a library that can handle the specific data types and formats you need.
Batch vs streaming: Are you working with batch or streaming data?
Another consideration is whether you work with batch or streaming data. All data visualization libraries can handle batch data, but some libraries may require more effort to implement real-time updates for streaming data. Libraries such as HoloViews offer various options for dealing with streaming data, such as Pipes, Buffers, Streamz, and IOLoop. Other libraries, such as Matplotlib, Bokeh, and Plotly, can also handle streaming data, even if they do it with varying ease and convenience.
In summary, choosing the right data visualization tool requires careful consideration of various factors of your data, including data source, size, type, and format, as well as whether you work with batch or streaming data. By understanding these aspects, you can choose a library that meets your specific needs and helps you gain meaningful insights from your data.
3. Which charts do you want to use?
If a picture can say thousand words, which picture says precisely the words you want to communicate? We can visualize any data in multiple ways using various charts. A well-chosen chart can communicate complex data effectively and efficiently, enabling the viewer to grasp the underlying patterns, trends, or relationships easly. It will enrich your storytelling such that you can communicate the exact actionable insights you audiance needs.
Some examples of types of data visualizations include line charts, bar charts, column charts, stacked bar/column charts, area charts, scatter plots, bubble charts, heat maps, tree maps, choropleth maps, sankey diagrams, parallel coordinates plots, radar/spider charts, box plots, violin plots, histograms, pareto charts, gantt charts, waterfall charts, word clouds, network diagrams, sunburst charts, donut charts, gauge charts, streamgraphs, graphs/networks, geospatial, maps, and pie charts. Here is a list of examples: https://www.python-graph-gallery.com/?utm_content=cmp-true
Each library may offer different chart types with varying ease of use, and their terminology may vary.
Choosing the correct data visualization library is essential to communicate your data effectively. Ensure that the library you select offers the specific chart types you require and provides the customization options to meet your data visualization needs. By selecting the appropriate chart type and library, you can create compelling, visually appealing representations that convey your data’s story accurately and effectively.
4. Would you like to Interact with the charts?
Interactivity can boost your ability to explore and analyze data through different methods, such as filtering, sorting, zooming, and panning. These features enhance your abilities to visualize complex data sets or highlight specific aspects of the data. For example, you can explore the data more closely and increase your understanding of the underlying patterns and trends. You can filter and sort the data to focus on specific aspects of the data set. You can also zoom in and out to examine the data at different levels of granularity. These features can be instrumental in working with large data sets. Interactivity can help you communicate your findings more effectively by enabling you to create dynamic and engaging visualizations that tell a story about the data. Interaction can also provide an opportunity to identify outliers or anomalies in the data that are hard to find with a static visualization.
Even though some data visualization libraries offer such capabilities, they can differ in the level of interactivity they provide. While others may have more limited options or focus on other aspects of data visualization, such as creating static charts and graphs, some libraries offer advanced feature options for interaction, such as tooltips, hover effects, and more sophisticated zooming capabilities. The level of interactivity a library offers can significantly impact the user’s ability to explore and analyze the data effectively.
Ultimately, the level of interactivity required will depend on your specific needs and intended audience for the visualizations. Therefore, keep the interactivity capabilities of a library in mind before making your choice.
5. Would you like to animate your visualisation?
Animations can effectively present data changes over time, such as system evolution or the behavior of a variable. They can demonstrate cause-and-effect relationships or not immediately apparent highlight patterns and trends in static visualizations.
Some data visualization libraries provide a range of options for customizing and controlling the animation, such as the frame rate, duration, and playback controls. While many popular data visualization libraries offer animation capabilities, the animation creation process can sometimes be complex and require significant coding expertise. In some cases, creating an animation in these libraries may require writing custom code or using external tools to generate animated GIFs or videos. Even worse, some libraries may not offer built-in animation options at all, often because these libraries focus on other aspects of data visualization instead of prioritizing animation as a key feature.
For example, libraries that offer animation capabilities include Matplotlib, Plotly, and Bokeh.
If animation is a critical aspect of your visualization goals, your may need to choose a library that offers robust and user-friendly animation tools.
6. How much would you like to customize the data visualization?
While most libraries provide preset visualizations, your use case might require you to change them further. Examples of what you might want to customize include all the points we have discussed so far; the charts, interaction, and animation. Universal Design - for example, supporting color blindness through the choice of color and minimizing information overload - is another consideration to make when visualization. It can impact what your audience receives.
While most data visualization libraries provide a range of chart types, the options for customization and styling can vary significantly between libraries. Some libraries may provide limited customization option sets, while others may offer extensive options for customizing colors, color schemes, fonts, labels, and animations. The same goes for both interaction and animation.
For example, Bokeh is a library that allows you to toggle the enabled interactions. If you also want to customize your color choices, the Python eco-system contains a set of external color libraries at your disposal. They provide a range of present templates. Here are some options: colorcet, palettable, cmocean, CMasher, cmcrameri, viscm
The choice of a library that offers the level of customization you need for your specific data visualization is pivotal in data visualization.
7. Do you need a specific implementation?
A common task for developers is searching for API methods. A practical tip to expedite this process is first to grasp the fundamental components of a library, such as the API type, relevant terminologies, syntax, and component structures. This understanding simplifies the learning curve and accelerates the search, enabling faster development and knowledge transfer between libraries, ultimately expanding your toolkit.
We can categorize the types of APIs available in visualization libraries offer as imperative and declarative. The choice between these API paradigms depends on the desired control level over the visualization process and the preferred way of specifying data and graphic elements.
Imperative APIs provide programmers with in-depth control over each step in the data visualization process. Libraries such as Matplotlib, Pyplot, and Pandas’ .plot() API are examples of imperative APIs.
On the other hand, declarative APIs separate the visualization rendering from the data, allowing users to specify desired graphics or data types while the library handles the rendering. Examples of declarative API libraries include ggplot, plotnine, Altair, Vega, and Bokeh. There are two common ways of using declarative APIs. Declarative APIs can be used to specify graphics and then apply data, as in Vega and Vega-lite, or to specify data types and let the library choose appropriate Graphics, as in the HoloViz ecosystem.
Furthermore, some libraries like Matplotlib offer multiple API styles, both imperative and declarative, as well as an object-oriented approach, catering to different user preferences and requirements.
When it comes to syntax, although different libraries may have unique syntax, some, like Matplotlib and Matlab, share some similarities. Some libraries support similar APIs as other popular libraries, making it easier for users to transition between them.
Lastly, when it comes to the implementations, data visualization libraries also differ in their code implementation and structuring of visual elements. For example, they may contain elements for colors, shapes, widgets, and chart types. Also, they may have arranged these components in column-row or graph structures, like trees.
8. Where would you like to render and publish the output?
When done defining your visuals, you might want to see them as apps, or images, include the visualizations in reports or presentations or share them in a print format, such as books, papers, or articles, which requires you to ensure that the visualizations are of high enough quality, such as printing at the desired size and resolution. Furthermore, you might want to share the visualizations with non-technical users who may not have access to the same tools or libraries that you used to create them. It should almost go without mentioning that libraries offer you various options.
Data visualization libraries can use different rendering technologies, such as WebGL, OpenGL, Matplotlib, or JavaScript, which enable various resolutions and control. Additionally, some libraries offer various exporting options such as saving the visualization as static images, Native GUI apps, Jupyter Notebook embedding, export to PDF or HTML, and standalone web-based dashboards and apps, which we can share with others. You can also use low-code alternatives that help visualize your data on a platform. e.g. Power BI, Excel, Tableau. Some libraries, such as Matplotlib, provide options for exporting visualizations in formats suitable for print, called hard-copy backends, in a format suitable for digital interaction, and in user-customized backends.
In summary, the ability to export visualizations in different formats is an important consideration when choosing a data visualization library, as it can affect the usability and accessibility of the visualizations you create.
9. Is the library still maintained?
Opting for a maintained library with well-documented code and community offers a more reliable and up-to-date tool for your data visualization needs, in addition, to support for issues that arise.
When choosing a library, it is crucial to consider its maintenance status, maintenance history, and user community. Dormant but seemingly active libraries, unmaintained libraries, or libraries with small community support may have compatibility issues with newer software or unresolved bugs. Such libraries could require you to create workarounds, fix problems, or migrate to other software.
For instance, dormant data visualization projects such as d3po, ggpy, vega, visvis, leather, lightning, and gleam may still function for most use cases. However, they could contain unaddressed bugs or compatibility issues due to a lack of maintenance.
Choosing a well-maintained data visualization library is vital for ensuring reliability, current features, and a supportive user community. Be mindful of the library’s maintenance status and active user base to avoid potential issues or the need to switch to another software.
10. What is your budget?
Budgeting is an important consideration when choosing a data visualization library, as these libraries can vary in price depending on their features and licensing model.
Data visualization libraries come in various price tiers, ranging from completely open source to fully closed behind a paywall. The completely open-source libraries are freely available to anyone without restrictions or limitations. On the other hand, some libraries are fully closed and require a paid subscription or license to access their features. However, there exist libraries offering a combination of open-source and paid services. These libraries may offer some features for free but require payment for more advanced features or additional support. Some libraries may be more cost-effective depending on your needs and budget. It’s also worth noting that some libraries may offer discounts for Academic or non-profit use.
For example, Tableau is a data visualization tool that requires a paid subscription to access its full suite of features. Plotly is an example of a library that offers an open-source version and a paid version with additional features and support.
It’s cost-saving to consider the pricing model of a data visualization library before choosing it for your project.
Conclusion
In conclusion, when choosing a data visualization tool, it is critical to consider the purpose and audience of your data visualization. The capacity of a library, such as data source compatibility, chart types available, customization options, interactivity, community support and documentation, data transformation tools, exporting options, its generality or specificity, and cost and licensing, can impact what you can communicate to your audience. Ultimately, the tool choice for a particular task will depend on your specific use case and needs.