Couple of questions. (1) Can we specify how to plot data? Some of us might already know what\u0027s the best graph type. I use R/ggplot2 for my academic research (currently involving discourse analysis, and in the past, social network analysis). (2) How \u0022ready\u0022 must the data be? Do I still need to hire an assistant to clean the data, etc.? Thanks!

Question

Jack_ChartPixel · Answer

Good question! Here\u0027s a slightly longer answer to explain how ChartPixel determines which charts to draw.\u000a\u000a1. After uploading the data, you will first see an optional column selection screen. From your selection, you get a gallery of recommended charts, bascially charts ranked by their insight quality suitable for the type of data. If you select very few columns or your data is aggregated or very small, all charts may be shown already, including those that have a low insight quality. Insight quality is determined by our AI and ML models.\u000a\u000aThere is also a data exploration link that lets you pick up to three columns, and you can click on \u0022generate charts\u0022. This will produce charts based on your choice. For example:\u000a\u000a\u002D if you pick a categorical variable and a numerical variable, you will likely get a bar chart\u000a\u002D if you pick a date column and numerical variable, you will get a line chart\u000a\u002D if you pick just a numerical variable, you will likely get a density plot\u000a\u002D if you pick two numerical variables, you might get a scatter plot, regression etc.\u000a\u000aSometimes a specific chart is not drawn, though. If, for example, you combine a categorical variable with a numerical variable and there are too many categories or some have a very small sample size. In this case, you might just get a ranking chart with less statistical insight. Other times when a chart that you expect might not be drawn are, for example, if the bars are all the same or N/A. There are certain \u0022checks\u0022 in place that can be, however, overridden in the exploration mode.\u000a\u000aBear in mind that although ChartPixel has many different chart types and we are continuously adding more, they are more classic chart types and no infographs or speciality charts. We currently don\u0027t display network graphs. That is, ChartPixel does not do a deep textual analysis (this may change in the future) that would be required for a network graph. It does show word clouds from comments, though, and, if combined with dates, may even show you trends in keywords.\u000a\u000a2. ChartPixel does a lot of wrangling and cleaning behind the scenes, and not all has been explained in detail here. Here are some common things it does:\u000a\u000a\u002D groups numerical variables (if they are continuous enough) to offer more chart types\u000a\u002D removes dupes\u000a\u002D removes rows that appear odd\u003B for example, if in a large dataset column one unidentified string is found\u000a\u002D converts textual numbers into numbers, for example, 2 dollars 50 cents becomes 2.5 with the column name being changed to indicate its in dollars.\u000a\u002D converts units into numbers, for example, 2M 250T becomes 250000, same is true for some standard units it finds, e.g., meters, speed units, currencies.\u000a\u002D dates are being checked for potential date mismatches (accidental months/days reversals)\u000a\u002D groups similar columns to be represented into one graph, for example, if you have a questionnaire.\u000a\u002D semantically ranks categories, for example, column values such as strongly agree, disagree, neither agree, disagree, or high school, bachelor, PhD etc. become ordered to represent better charts and extract meaningful trends.\u000a\u002D panel data if you have date columns \u000a\u002D extract multiple tables from one spreadsheet\u000a\u002D parses countries to be recognized as such and creates regions, sub\u002Dregions etc.\u000a\u002D parses percentages, e.g., if you have 20% as value it becomes numerical for calculations, while the charts will still show you the percentage sign.\u000a\u002D renames long column names for better visualizations\u000a\u002D tries to parse additional or merge headers in your dataset to make it \u0022tabular\u0022 \u000a\u002D combines certain columns semantically, for example, name and last name, to create unique identifiers. It also creates new columns, for example, age from d.o.b.\u000a\u002D converts textual dates to the best of its abilities into dates so that they can be represented on timelines, e.g., January 2024, Feb 2024, 2nd January etc.\u000a\u002D converts some columns into True/False for better charts. For example, you have a questionnaire and a string column from a multi\u002Dselect question representing one option. \u000a\u002D Converts missing values into N/A contexually.\u000a\u000aThis is a small selection of what ChartPixel checks. The algorithms behind these transformations have been continuously improved over time, and we are continuing to refine these with input from our users. It works for some datasets better than for others. Of course, if you drop already a mostly \u0022clean\u0022 dataset, it would be better as there are less unknowns in the whole process to deal with.\u000a\u000aHope this is helpful! Let us know if you have more questions, we are more than happy to answer.

ChartPixel

Share ChartPixel

Related questions