Big Data – a Review from CloudTimes
Big Data is certainly one of the new topics around cloud computing. The Strata conference in Santa Clara, CA displayed Big Data prominently and made it the main topic of the three day event.
Key Topics were:
- Becoming a data-driven organization
- Data’s evolution from research to product
- Applications, case studies, and cautionary tales
- Distributed data processing, Hadoop ecosystem
- Data acquisition, crowdsourcing, cleaning, distribution and markets
- Machine learning
- Real-time data processing and analytics
- Data science best practice
- Visualization and design principles
- Augmented reality and immersive interfaces
- Data protection, privacy, and policy
- Training and recruitment of data scientists
According to Wikipedia,
Big data are datasets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. This trend continues because of the benefits of working with larger and larger datasets allowing analysts to “spot business trends, prevent diseases, combat crime.” Though a moving target, current limits are on the order of terabytes, exabytes and zettabytes of data. Scientists regularly encounter this problem in meteorology, genomics, biological research, Internet search, finance and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, “software logs, cameras, microphones, RFID readers, wireless sensor networks and so on.”
CloudTimes was part of the event and we had close coverage on Twitter and posted several videos and photos (see above).
We have also covered Big Data through our site. See some of the latest post below and track us through the category “Big Data”:
EMC Greenplum To Announce Free Community Edition of ‘Big Data’ at Strata
EMC Corporation will announce today at the O’Reilly Strata Conference in Santa Clara, a free Community Edition of the EMC® Greenplum® Database, the industry-leading, high-performance massively parallel processing (MPP) database product, along with free analytic algorithms and data mining tools.
Big Data Analytics and the Cloud
Cloud computing helps organizations store, manage, share, and analyze their Big Data in an affordable and easy-to-use way. Today’s cloud Infrastructure-as-a-Service (IaaS) providers such as Microsoft, GoGrid, Amazon, Google, Rackspace and Slicehost, supported by the on-demand analytics solution vendors, make Big Data analytics very affordable.
View all Strata Conference videos about Big Data on our YouTube channel. Here are some examples:
Strata Conference 2011 – Sudhir Hasbe (Microsoft), Bruno Aziza (Microsoft)
Microsoft DataMarket: Leveraging cloud to deliver public domain and commercial data to millions
Windows Azure Marketplace includes data, imagery, and real-time web services from leading commercial data providers and authoritative public data sources. Customers have access to datasets such as demographic, environmental, financial, retail, weather and sports. Developers can build applications for various platforms like PC, Servers, Azure, Windows Phone, IPhone, IPAD etc using data from DataMarket. Developers can access the data as a service through an industry standard ODATA API. information workers can use the data to perform analysis using tools like Excel, PowerPivot and 3rd party applications. DataMarket also includes visualizations and analytics to enable insight on top of data.
Strata Conference 2011 – Kim Rees (Periscopic)
Small is the New Big: Lessons in Visual Economy
While the majority of charts were designed to handle a variety of data without regard for implementation, there is a certain efficiency and novelty of presenting data in a very succinct way. By designing a presentation method restricted to specific data points, we can realize an economy of space and interface. It’s often practical to use the standard charts that are tried and true, but by eschewing the norms we can create visualizations that maximize interface real estate and provide a succinct view of our data.
There are many cases when addressing data visualization for a small space is beneficial — mobile accessibility, data journalism, and high risk/situational awareness applications to name a few. By considering the goals of the visualization and finely honing our designs, we can create highly useful interfaces that maximize comprehension and elevate judgment.
The following aspects of data-specific visualizations will be covered: * Telling smaller stories with the data * Encapsulating the design by choosing related data * Designing for multiple encodings * Abstracting visual context for brevity * Working within constraints * Goal-based visualization * Small multiples