Data visualization is known to be the representation of data in some systematic form without excluding its characteristics and variables for every unit of information. Visualization-established data discovery techniques allow business users to combine different data sources to create specific analytical views. High-level analytics can be united in the methods to allow the creation of animated interactive graphics on mobile devices like smartphones and tablets, laptops, or desktops. Se the advantages of data visualization in regards to the respondent portions of a survey below.
There are some points of guidance or advice for visualization: (1) You should not forget the metadata. The data about data can be very revealing. (2) Participation matters a great deal. Visualization tools should be very interactive, and user engagement is also very critical. (3) Encourage interactivity. Static data equipment or tools do not lead to more findings just like interactive tools too do.
Big data are relatively high in volume, high in velocity, and also have high varieties of datasets which requires new forms of processing merely to give or allow an improved process of optimization, decision making, and insight discovery. Challenges and difficulties of Big Data lie in searching, the data capture aspect, analysis, the storage, sharing, and most importantly, the data visualization. Data Visualization can be said to be the “front end” of big data itself. Most common data visualization myths are detailed below:
- All data need to be adequately visualized: It is also essential that you do not over confidently rely on visualization; some of the data does not necessarily need visualization techniques to reveal its messages.
- Only the good data need visualization: A very quick and simple visualization should be able to highlight bugs or errors in data to help uncover and reveal interesting trends accurately.
- Data visualization will always take the best action or decision: although data visualization cannot substitute critical thinking.
- Data visualization would lead to certainty: The data is visualized do not mean it will reveal or shows an accurate or exact picture the important or what is important. Data visualization might be manipulated or twisted with different effects.
Data visualization approaches or methods are used to start or create diagrams, tables, images, and some other intuitive or natural ways to display data or data representation. Big Data visualization is not usually easy unlike the traditional small data sets. Some of the extension of conventional or traditional visualization methods and approaches already have been emerged but not yet enough. In a significant or large-scale data visualization, most researchers have previously used geometric modeling and feature extraction to hugely decrease or minimize data size before the actual rendering of data. Choosing the right or proper data representation also is very important when big data is been visualized.
The initial goal and objectives is to present new approaches, advances, methods of Big Data visualization by bringing and introducing traditional and conventional visualization techniques, methods and also the extension of them to handle big data, discussing the difficulties and challenges of big data visualization and also analyzing technology advancement in big data visualization.
Conventional Data Visualization Methods
Some conventional data visualization methods are frequently used. They are: table, histogram, scatter plot, line graph, bar outline, pie outline, region graph, stream outline, bubble outline, various data series or blend of charts, timeline, Venn graph, data stream graph, and element relationship outline, and so forth. What’s more, some data visualization methods have been used despite the fact that they are less known analyzed the above methods. The extra methods are parallel coordinates, treemap, cone tree, and semantic network, and so on.
Parallel coordinates are used to plot singular data elements across many dimensions. Parallel arrange extremely useful when to display multidimensional data. Figure 1 shows parallel coordinates. Treemap is a successful strategy for visualizing levels or hierarchies. The amount or size of each sub-rectangle stands for or represents one measure, while shading is regularly used to represent another measure of data. Figure 2 shows a treemap of a gathering of choices for streaming music and video tracks in a social network group. Cone tree is another technique displaying various leveled data such as authoritative body in three dimensions. The branches develop as cone. A semantic network can be said to be a graphical representation of sensible relationship between various concepts. It generates coordinated chart, the blend of nodes or vertices, edges or arcs, and mark over each edge.
Visualizations are not just static; they can be intelligent. Intelligent visualization can be performed through approaches such as zooming (zoom in and zoom out), review and detail, zoom and dish, and focus and setting or fish eye. The steps for intelligent visualization are as follows:
1. Selecting: Intuitive selection of data entities or subset or part of entire data or entire data set by the user interest.
2. Connecting: It is useful for relating data among numerous views. An illustration is shown in Figure 3.
3. Separating: It helps users adjust the measure of data for display. It decreases data amount and focuses on data of interest.
4. Revamping or Remapping: Because the spatial design is the most imperative visual mapping, adjusting the spatial format of the data is extremely successful in creating distinctive insights.
New database technologies and promising Electronic visualization approaches might be essential for diminishing the cost of visualization age and enabling it to help enhance the scientific process. Because of Electronic connecting technologies, visualizations change as data change, which enormously reduces the push to keep the visualizations timely and a la mode. These “low-end” visualizations have been regularly used in business analytics and open government data systems, however they have by and large not been used in the scientific process. Numerous visualization tools that are accessible to scientists don’t permit live connecting as do these Electronic tools.
Challenges of Big Data Visualization
Scalability and dynamics are two noteworthy challenges in visual analytics. Table 2 shows the research status for static data and dynamic data as per the data size. For big powerful data, solutions for sort A problems or sort B problems frequently don’t work for An and B problems.
The visualization-based approaches take the challenges presented by the “four Vs” of big data and transform them into following opportunities.
• Volume: The methods are produced to work with an immense number of datasets and empower to get significance from substantial volumes of data.
• Assortment: The methods are produced to consolidate as numerous data sources as required.
• Speed: With the methods, businesses can supplant bunch processing with real-time stream processing.
• Esteem: The methods not just empower users to make alluring infographics and heatmaps, yet in addition make business esteem by picking up insights from big data.
Visualization of big data with diversity and heterogeneity (structured, semi-structured, and unstructured) is a big issue. Speed is one of the desired factor for the big data analysis. Designing another visualization tool with productive ordering is difficult in big data. Cloud processing and progressed graphical user interface can be converged with the big data for the better administration of big data scalability.
Visualization systems must battle with unstructured data forms such as graphs, tables, content, trees, and other metadata. Big data frequently has unstructured formats. Because of transfer speed limitations and power requirements, visualization should draw nearer to the data to remove significant data productively. Visualization software should be keep running in an in situ way. Because of the big data size, the requirement for massive parallelization is a test in visualization. The test in parallel visualization algorithms is decomposing an issue into free tasks that can be run simultaneously.
Viable data visualization is a key piece of the discovery process in the time of big data. For the challenges of high multifaceted nature and high dimensionality in big data, there are diverse dimensionality diminishment methods. Nonetheless, they may not always be pertinent. The more dimensions are visualized successfully, the higher are the chances of perceiving possibly interesting correlations, patterns, or outliers.
There are some problems following big data visualization:
• Visual noise: Most of the objects in dataset are excessively relative, making it impossible to each other. Users can’t partition them as separate objects on the screen.
• Data loss: Decrease of visible data sets can be used, however leads to data loss.
• Huge picture discernment: Data visualization methods are not just constrained by aspect proportion and resolution of gadget, yet additionally by physical observation limits.
• High rate of picture change: Users observe data and can’t respond to the quantity of data change or its intensity on display.
• Superior requirements: It can be not really seen in static visualization because of lower visualization speed requirements- – elite necessity.
Perceptual and intelligent scalability are also challenges of big data visualization. Visualizing each datum point can prompt over-plotting and may overpower users’ perceptual and psychological capacities; diminishing the data through sampling or sifting can omit interesting structures or outliers. Questioning extensive data stores can result in high inactivity, disrupting familiar connection.
In Big Data applications, it is hard to lead data visualization because of the substantial high dimension size of big data. Many of current Big Data visualization tools have poor performances in scalability, functionalities, and response time. Vulnerability can result in an awesome test to compelling vulnerability mindful visualization and arise amid a visual analytics process.
Solutions to most challenges and problems of visualization and big data:
1. Addressing the requirement for speed: One possible solution is equipment. Increased memory and capable parallel processing can be used. Another strategy is placing data in-memory however using a lattice registering approach, where many machines are used.
2. Understanding the data: One of the solution is to have adequate and possible area of expertise on ground set up.
3. Addressing data quality: We must ensure that the data is perfect through the process of data administration or data administration.
4. Displaying significant results: One route is to cluster data into a larger amount see where smaller of the groups of data are visible and the data can be properly visualized.
5. Managing outliers: Possible solutions are to expel the outliers from the data or make a separate diagram for the outliers.
Some Progress of Big Data Visualization
On the way visualization should be designed in the period of big data, visualization approaches should give an outline first, at that point permit zooming and sifting, and give profound details on request. Visualization can assume an imperative part in using big data to get a total perspective of customers. Relationships are a vital aspect of numerous big data scenarios. Social networks are perhaps the most noticeable case and are exceptionally hard to understand in content or forbidden configuration; be that as it may, visualization can make developing network trends and patterns clear. A cloud-based visualization strategy was proposed to visualize an inherence relationship of users on social network. The strategy can intuitionally present the users’ social relationship based on the connection grid to represent a progressive relationship of user nodes of social network. What’s more, the strategy uses Hadoop based on cloud for the distributed parallel processing of visualization, which helps speed up the big data of social network.
Big data visualization can be performed through various approaches such as more than one view for each representation display, dynamical changes in number of factors, and sifting (dynamic question filters, star-field display, and tight coupling), and so on. Several visualization methods were broke down and classified as indicated by data criteria: (1) substantial data volume, (2) data assortment, and (3) data dynamics.
Treemap: It is based on space-filling visualization of various leveled data.
Circle Pressing: It is an immediate other option to treemap. Besides the way that as primitive shape it uses circles, which also can be incorporated into circles from a higher chain of importance level.
Sunburst: It uses treemap visualization and is changed over to polar arrange system. The primary contrast is that the variable parameters are not width and stature, but rather a radius and circular segment length.
Parallel Coordinates: This simply allows visual analysis to be stretched out with numerous data factors for various objects.
Streamgraph: It is a sort of a stacked range chart that is displaced around a focal axis resulting in streaming and natural shape.
Roundabout Network Outline: Data protest are set around a circle and connected by curves based on the rate of their relativeness. The diverse line width or shading saturation is usually used to measure protest relativeness.
Customary data visualization tools are regularly deficient to deal with big data. Methods for intuitive visualization of big data were presented. First, a design space of scalable visual summaries that use data lessening approaches (such as binned total or sampling) was described to visualize an assortment of data types. Methods were then produced for intelligent questioning (e.g., brushing and connecting) among binned plots through a blend of multivariate data tiles and parallel inquiry processing. The created methods were executed in imMens, the browser-based visual analysis system which uses WebGL for their data processing and rendering on the GPU.
A considerable measure of big data visualization tools keep running on the Hadoop stage. The normal modules in Hadoop will be: Hadoop Normal, Hadoop Distributed Document System (HDFS), Hadoop YARN, and Hadoop MapReduce. They break down big data effectively, yet need satisfactory visualization. Some software with the functions of visualization and cooperation for visualizing data has been produced:
Pentaho: It supports the spectrum of Business Intelligence functions like analysis, dashboard, enterprise-class announcing, and data mining.
Flare: An ActionScript library for making data visualization that runs in Adobe Flash Player.
JasperReports: It has a novel software layer for creating reports from the big data storages.
Cloudera and Datameer Analytics Solution: Datameer and Cloudera have banded together to make it easier and faster to place Hadoop into creation and help users to use the energy of Hadoop.
Platfora: Platfora converts crude big data in Hadoop into intuitive data processing motor. It has measured usefulness of in-memory data motor.
ManyEyes: It is a visualization tool propelled by IBM. Many Eyes is an open website where users can transfer data and make intuitive visualization.
Scene: It is a business intelligence (BI) software tool that supports intelligent and visual analysis of data. It has an in-memory data motor to quicken visualization.
Scene has three primary products to process huge scale datasets, including Scene Desktop, Scene Sever, and Scene Open. Scene also implant Hadoop infrastructure. It uses Hive to structure queries and reserve data for in-memory analytics. Storing helps decrease the inertness of a Hadoop cluster. In this manner, it can give an intuitive mechanism amongst users and Big Data applications.
Most of the big data processing tools can process ZB (zettabytes) and PB (petabytes) data normally, however they frequently can’t visualize ZB and PB data. At present, big data processing tools incorporate Hadoop, Superior Figuring and Communications, Storm, Apache Bore, RapidMiner, and Pentaho BI. Data visualization tools incorporate NodeBox, R, Weka, Gephi, Google Diagram Programming interface, Flot, D3, and Visual.ly, and so on. A big data visualization calculation analysis incorporated model based on RHadoop was proposed. The incorporated model can process ZB and PB data and show profitable results by means of visualization. This model is usually suitable for the design of the parallel algorithms for ZB and PB data.
Intelligent visual cluster analysis is the most natural route for discovering clustering patterns. The most difficult step is visualizing multidimensional data and enabling users to intelligently investigate the data and distinguish clustering structures. Upgraded star-facilitate visualization models for powerful intelligent cluster investigation on big data were produced. The star-facilitate models are presumably the most scalable system for visualizing expansive datasets contrasted and other multidimensional visualization methods such as scatter-plot grid and parallel coordinates:
• Parallel coordinates and scatter-plot grid are regularly used for less than ten dimensions, while star coordinates can deal with tens of dimensions.
• The star-organize visualization can scale up to many points with the assistance of density-based representation.
• Star-arrange based cluster visualization does not endeavor to figure pairwise distances between records; it uses the property of the hidden mapping model to halfway keep the distance relationship. This is extremely useful in processing big data.
Coordinate visualization of big data sources is frequently impractical or compelling. Analytics plays a key part by diminishing the size and unpredictability of big data. The visualization and analytics can be coordinated so that they work best. IBM has implanted visualization capabilities into business analytics solutions. What makes this possible is the IBM Quickly Versatile Visualization Motor (RAVE). Extensible visualization and RAVE abilities help utilize successful visualization that provides a superior understanding of big data. IBM products, such as IBM® InfoSphere® BigInsights™ plus IBM SPSS® Diagnostic Catalyst, employs visualization libraries and RAVE to empower intelligent visualizations that can help increase extraordinary insight from big data. InfoSphere BigInsights is the software that helps break down and discover business insights covered up in big data. SPSS Expository Catalyst automates big data readiness, chooses legitimate analytics procedures, and display results through intelligent visualization.
The use of virtual reality (VR) platforms for scientific data visualization always been in the process or stage of continuous research without excluding software and inexpensive product equipment. These conceivably intense and imaginative tools for multi-dimensional data visualization can give an easy way to community-oriented data visualization. Immersion provides benefits past conventional “desktop” visualization tools: it results in a superior view of data scape geometry and more natural data understanding. Immersive visualization should end up noticeably one of the foundations to investigate the higher dimensionality and abstraction that are chaperon with big data. The fundamental human pattern recognition (or visual discovery) knowledge and most time skills must always or constantly be increased by using developing technologies that has been known to be associated with the immersive Virtual Realiy.
The SWOT (Weaknesses, Opportunities, Strengths, and Threats) analysis is an outstanding strategy to ensure that both positive factors and negative factors are recognized. A SWOT study of the above software tools for big data visualization has been directed. Strengths and Opportunities are confident factors; Weaknesses and Threats are negative factors
Visualizations can be static or dynamic. Intuitive visualizations frequently prompt discovery and make a superior showing with regards to than static data tools. Intelligent visualizations can help increase awesome insight from big data. Intuitive brushing and connecting between visualization approaches and networks or Online tools can encourage the scientific process. Electronic visualization helps get dynamic data timely and stay up with the latest.
The extension of some conventional visualization approaches to taking care of big data is a long way from enough in functions. All the more new methods and tools of Big Data visualization should be produced for various Big Data applications. Advances in Big Data visualization are presented and a SWOT analysis of current visualization software tools for big data visualization has been directed in this paper. This will help grow new methods and tools for big data visualization. Big Data analytics and visualization can be coordinated firmly to work best for Big Data applications. Immersive virtual reality (VR) is another and intense technique in taking care of high dimensionality and abstraction. It will encourage Big Data visualization enormously.