Q: Couple of questions.
(1) Can we specify how to plot data? Some of us might already know what's the best graph type. I use R/ggplot2 for my academic research (currently involving discourse analysis, and in the past, social network analysis). (2) How "ready" must the data be? Do I still need to hire an assistant to clean the data, etc.? Thanks!

Jack_ChartPixel
Aug 20, 2024A: Good question! Here's a slightly longer answer to explain how ChartPixel determines which charts to draw.
1. After uploading the data, you will first see an optional column selection screen. From your selection, you get a gallery of recommended charts, bascially charts ranked by their insight quality suitable for the type of data. If you select very few columns or your data is aggregated or very small, all charts may be shown already, including those that have a low insight quality. Insight quality is determined by our AI and ML models.
There is also a data exploration link that lets you pick up to three columns, and you can click on "generate charts". This will produce charts based on your choice. For example:
- if you pick a categorical variable and a numerical variable, you will likely get a bar chart
- if you pick a date column and numerical variable, you will get a line chart
- if you pick just a numerical variable, you will likely get a density plot
- if you pick two numerical variables, you might get a scatter plot, regression etc.
Sometimes a specific chart is not drawn, though. If, for example, you combine a categorical variable with a numerical variable and there are too many categories or some have a very small sample size. In this case, you might just get a ranking chart with less statistical insight. Other times when a chart that you expect might not be drawn are, for example, if the bars are all the same or N/A. There are certain "checks" in place that can be, however, overridden in the exploration mode.
Bear in mind that although ChartPixel has many different chart types and we are continuously adding more, they are more classic chart types and no infographs or speciality charts. We currently don't display network graphs. That is, ChartPixel does not do a deep textual analysis (this may change in the future) that would be required for a network graph. It does show word clouds from comments, though, and, if combined with dates, may even show you trends in keywords.
2. ChartPixel does a lot of wrangling and cleaning behind the scenes, and not all has been explained in detail here. Here are some common things it does:
- groups numerical variables (if they are continuous enough) to offer more chart types
- removes dupes
- removes rows that appear odd; for example, if in a large dataset column one unidentified string is found
- converts textual numbers into numbers, for example, 2 dollars 50 cents becomes 2.5 with the column name being changed to indicate its in dollars.
- converts units into numbers, for example, 2M 250T becomes 250000, same is true for some standard units it finds, e.g., meters, speed units, currencies.
- dates are being checked for potential date mismatches (accidental months/days reversals)
- groups similar columns to be represented into one graph, for example, if you have a questionnaire.
- semantically ranks categories, for example, column values such as strongly agree, disagree, neither agree, disagree, or high school, bachelor, PhD etc. become ordered to represent better charts and extract meaningful trends.
- panel data if you have date columns
- extract multiple tables from one spreadsheet
- parses countries to be recognized as such and creates regions, sub-regions etc.
- parses percentages, e.g., if you have 20% as value it becomes numerical for calculations, while the charts will still show you the percentage sign.
- renames long column names for better visualizations
- tries to parse additional or merge headers in your dataset to make it "tabular"
- combines certain columns semantically, for example, name and last name, to create unique identifiers. It also creates new columns, for example, age from d.o.b.
- converts textual dates to the best of its abilities into dates so that they can be represented on timelines, e.g., January 2024, Feb 2024, 2nd January etc.
- converts some columns into True/False for better charts. For example, you have a questionnaire and a string column from a multi-select question representing one option.
- Converts missing values into N/A contexually.
This is a small selection of what ChartPixel checks. The algorithms behind these transformations have been continuously improved over time, and we are continuing to refine these with input from our users. It works for some datasets better than for others. Of course, if you drop already a mostly "clean" dataset, it would be better as there are less unknowns in the whole process to deal with.
Hope this is helpful! Let us know if you have more questions, we are more than happy to answer.