What Is Big Data?
Big data is a concept that describes the flood of unstructured data created by everything from social media posts and network traffic to the Internet of Things (IoT), public safety cameras, and global weather data. Unlike small data—which can be structured, stored, and analyzed in a relational database—big data exceeds the capacity of tables, rows, and columns in complexity and processing.
Small data and big data lie on a spectrum. You know you’ve entered the big data realm when you see extreme data volume, velocity, and variety.
Big Data Volume
As you might have guessed, big data is big. Huge, in fact. Big data sets easily exceed a petabyte (1,000 terabytes) and can reach into the exabytes (1,000 petabytes). Data sets this large are beyond human comprehension and traditional computing capacity. Making sense of big data—identifying meaningful patterns, extracting insights, and putting it all to work—requires machine learning, AI, and serious computing power.
Big Data Velocity
Big data doesn’t arrive in a daily expense report or a month’s worth of transaction data. Big data comes in real time in extremely high volumes. An example: Google receives, on average, over 40,000 search queries per second,1 analyzes them, answers them, and serves up analytics-driven advertising for each and every one. That’s big data velocity.
Big Data Variety
On top of coming in petabytes per second, big data comes in every conceivable data type, format, and form. Big data includes pictures, video, audio, and text. Big data can be structured, like census data, or completely unstructured, like pictures from social posts.
Big data could come from video posts, the sensors in a factory, or all the cell phones using a specific app.
Why Is Big Data Important?
Big data is important because analyzing it unlocks information and insights that are beyond human perception and the ability of traditional database analytics.
For example, a person can look at a thermometer and decide if they should wear a warm hat. A database can hold a decade of daily temperatures, cross-reference temperature with hat sales, then project how many hats a retailer should order for October vs. November.
Big data analytics can review selfies as they post to social media; identify hats, hat material, and hat style; and then recommend which hats are trending—plus analyze global weather patterns and predict the chance of snow.
Big Data Use Cases
Fraud Detection
Banks, credit card companies, retailers, payment processors, and regulators use big data analytics to analyze real-time transaction data for signs of fraudulent activity. Machine learning algorithms can detect suspicious patterns, freeze accounts, and notify customers that their account may have been compromised. For example, PayPal is using big data analytics to help improve fraud detection accuracy and decrease fraud detection time.
Predictive Analytics
Video cameras, microphones, and other sensors can monitor practically any machine—a jet engine, factory equipment, an automobile—and capture data about its performance, movement, and environment. When coupled with machine learning and AI, this unstructured data can be used to identify early signs of wear, spot faults before equipment fails, and—in the case of automotive safety systems—actively intervene to prevent accidents.
Spatial Analysis and Public Safety
Machine learning is being used in large public settings like malls, stadiums, and transit facilities to extract real-time information from security video. These big data analytics systems use computer vision AI to analyze foot traffic, identify bottlenecks, and spot unsafe situations. The resulting insights can be used to understand retail performance, shift staff to support areas of high demand, or alert first responders if public safety is threatened. The Chicago Transit Authority is using big data and machine learning to help make the public transportation experience faster, smoother, and safer.
Network Performance
The performance of telecom, wireless, and computer networks is an ideal big data use case. Every packet traversing the network produces real-time performance data that can be analyzed by automated systems that can spin up additional network resources and optimize performance. Over longer-time horizons, big data insights can help network builders identify new infrastructure needs and prioritize investments.
Sentiment and Awareness
Marketers and pollsters use big data analytics to monitor publicly available online postings in social media, forums, and reviews to identify trends, hot topics, and public sentiment. Of course, social media companies use even more sophisticated big data analytics to produce finer-grained sentiment and demographic insights.
What Is Small Data?
Small data is data that can be structured and managed by a relational database like any one of the flavors of SQL, Oracle DB, Microsoft Access, or a basic spreadsheet. Don’t be fooled by the word “small” in small data, however. Small data comes in gigabyte to terabyte volumes. Information like inventory, transactions, customer records, order history, and sales performance are all examples of small data.
Why Is Small Data Important?
Small data houses big business value. Billion-dollar companies can extract the bulk of their business insight from the “small” structured data they collect through their operations. A well-designed traditional database can provide real-time streaming services for dynamic transactions like shopping cart recommendations, real-time dashboards, and financial transactions.
Small Data Use Cases
Patient Wellness
While big data can help healthcare systems detect things like billing errors, fraud, and inefficiencies, small data can help quantify individual patient progress, the effectiveness of medications, and compliance with treatment plans.
Business Operations and Efficiency
Any industry that produces transaction and event data, such as the travel and hospitality industries, can extract insights using standard databases and small data analytics. You don’t need big data techniques and AI to analyze on-time departures, table turn times, or vacancy rates. Small data analytics in these industries can drive applications that keep travelers updated on their flight status, help diners make reservations, and let guests know when their rooms are ready.
Supply Chain and Logistics
Since the advent of barcodes, optical character recognition (OCR), and radio frequency identification (RFID), supply chains and delivery services have produced constant data about the location, movements, and status of items. This is all small data, even though the volume and velocity can push into big-data terrain for global shipping firms. Why? Because the data is structured and uniform. Small data analytics in logistics can power automated sorting machines, send packages to the correct destination, and keep recipients informed about their order’s progress.
Sales and Customer Relationship Management (CRM)
Sales and CRM databases are excellent examples of small data analytics at work. The data is relatively homogenous and structured, yet it can yield major business insights. Do orders go up when salespeople call on customers more frequently? Which salespeople close more deals? Which customers produce higher margins? The answers lie in the small data produced by calendar activity and sales transactions plus customer and employee profiles.
Big Data vs. Small Data
This comparison table provides a quick reference on the key differences between small and big data and examples of how each could be applied in similar use cases.
|
Small Data |
Big Data |
---|---|---|
Data volume |
Gigabytes to terabytes |
Petabytes to exabytes |
Data velocity |
Controlled and constant; collects over time |
Large volumes at extremely high speeds |
Data variety |
Low: Typically tabular, text data |
High: Tabular data, JSON, images, text, audio, video |
Data quality |
High: Usually collected from defined, controlled sources |
Unpredictable: Comes from multiple, organic sources |
Data cleaning, prep, optimization |
Manual and automated processes (human programmed) |
Machine learning algorithms, AI |
Data structure |
Often structured from the source, housed in a relational database |
Unstructured mix of multiple data types |
Data housing |
Data mart, data warehouse, local or in the cloud |
Data lakes, data fabrics in public, hybrid, or private clouds |
Data analytics tools |
Traditional databases, SQL |
Machine learning, AI, data fabrics, SQL, Python, R, Java, Apache Spark |
Computing needs |
Ranges from a single server to requiring cloud resources |
Parallel and distributed computing, clusters, cloud resources |
Sample Use Cases |
||
|
Small Data |
Big Data |
General |
Business intelligence, reporting, sales and CRM, insight- and data-driven transactions and decision-making |
Data mining, predictive analytics, pattern recognition, sentiment analysis |
Airlines |
On-time performance, flight data, ticketing, CRM, loyalty programs |
Brand perception on social media, aircraft maintenance, fuel efficiency, route planning and optimization |
Shipping and logistics |
Package tracking, automated sorting, picking, packing, status and fulfillment reporting, operational efficiency |
Forecasting, package routing optimization, video analytics for loss prevention, worker safety |
Healthcare |
Individual patient progress, continuous quality improvement, clinical efficiency |
Error and fraud detection, system-wide efficiency, large-scale health trends and outcomes analysis |
Retail |
Customer loyalty programs, product performance, promotions, smart transactions, loss prevention |
Trend spotting, forecasting, fraud prevention, inventory and supply chain management, marketing |
Finance |
Individual business accounting and analytics, transaction analysis, real-time and historical insights |
Fraud detection, high-volume trading analysis, AI-driven transactions |
Working with Data, Big and Small
Big data and small data each present unique challenges. Many of the issues we associate with getting the most from data—capturing it accurately, cleaning it, and structuring it into database-friendly forms, plus asking the right questions in the right way—are small data issues. The same basic processes that define getting data into a spreadsheet and making it usable apply to the bulk of data analytics.
Structuring and analyzing big data sets is beyond the ability of humans and human-defined computing tools like databases. The volume, variety, and velocity of big data require machine learning simply to parse and comprehend it. This lowers the amount of expert human labor and reduces data storage complexity. Big data doesn’t need the highly structured data warehouses used in small data. It can live in flat, wide, unstructured data lakes.
But data lakes can be immense, and analyzing big data requires powerful computing resources. Big data may require less human capital; however, storing exabytes of data and operating distributed computing systems is expensive, whether it’s on-premises or in the cloud.
Big Data Solutions and Resources
Intel supports big data and small data processing with hardware, software, and developer toolkits. Intel works closely with SAP, Microsoft, Oracle, and open source communities to make sure their database products and big data services are optimized for Intel® Xeon® processors. Intel also provides optimized distributions of open source big data applications and tools along with data science tools for small data.
SAP and Intel
SAP and Intel work together to deliver in-memory computing and maximum performance across on-premises, public cloud, and hybrid environments.
Microsoft and Intel
Intel and Microsoft ensure that open source and third-party database and big data solutions make the most of Azure Cloud services and that SQL Server is continuously optimized for the latest Intel® hardware.
Oracle and Intel
Oracle and Intel partner on Oracle Cloud Services and Oracle Database and Exadata, plus the Oracle Machine Learning Module, to ensure Oracle products take advantage of the latest Intel® security, performance, and acceleration technologies.
Intel® CoFluent™ Technology
Intel® CoFluent™ technology is a simulation tool for modeling and optimizing big data computer clusters and networking.
See how Intel® CoFluent™ works
Intel® oneAPI Base Toolkit
Intel® oneAPI is a cross-architecture development toolkit that simplifies development for mixed hardware architectures. The base toolkit includes the Intel® oneAPI Data Analytics Library.
Intel® oneAPI HPC Toolkit
The HPC toolkit helps developers build, analyze, and scale applications across shared- and distributed-memory computing systems.
Intel® AI Analytics Toolkit
This toolkit helps accelerate open source data science and machine learning pipelines. It includes Intel® distributions and optimizations for Python, TensorFlow, and PyTorch.
Expect Big Things from Big Data
If the recent past is a precedent, big data will continue to grow in volume, velocity, and variety. At the same time, increasing computing power and storage capacity will likely drive down costs and unlock more insight from more data.
This virtuous circle will make the benefits of big data analytics more accessible to more businesses—and more people—than ever before. Expect breakthroughs in medicine and science, economics, and finance, and even gaming and entertainment as patterns, meaning, and value in the big data that is everyday life are uncovered.