Posts Tagged ‘Big Data’

Big Data AI coronavirus

South Korea winning the fight against coronavirus using big-data and AI

South Korea is using the analysis, information and references provided by this integrated data — all different real-time responses and information produced by the platform are promptly conveyed to people with different AI-based applications.

Whenever someone is tested positive for COVID-19, all the people in the vicinity are provided with the infected person’s travel details, activities, and commute maps for the previous two weeks through mobile notifications sent as a push system.

Turn Bad Data Into Good Data

How to Turn Bad Data Into Good Data

Date: Wednesday, January 22, 2020  Time: 1:00 pm CT

a panel of data and education experts about how to make the most of your education data. In this webinar you’ll learn about:

  • How rapid data turnover can hurt you (and your bottom line)
  • How to access “good‘‘ data and what it looks like
  • Opportunities open to you when your data is clean 
  • Avoiding the pitfalls of using outdated or irrelevant data and making decisions that are not data informed
  • Navigating the unique challenges of working in education, such as privacy regulations that might hinder communication 

more on big data in this IMS blog

Data driven design

Valuing data over design instinct puts metrics over users

Benek Lisefski August 13, 2019

Overreliance on data to drive design decisions can be just as harmful as ignoring it. Data only tells one kind of story. But your project goals are often more complex than that. Goals can’t always be objectively measured.

Data-driven design is about using information gleaned from both quantitative and qualitative sources to inform how you make decisions for a set of users. Some common tools used to collect data include user surveys, A/B testing, site usage and analytics, consumer research, support logs, and discovery calls. 

Designers justified their value through their innate talent for creative ideas and artistic execution. Those whose instincts reliably produced success became rock stars.

In today’s data-driven world, that instinct is less necessary and holds less power. But make no mistake, there’s still a place for it.

Data is good at measuring things that are easy to measure. Some goals are less tangible, but that doesn’t make them less important.

Data has become an authoritarian who has fired the other advisors who may have tempered his ill will. A designer’s instinct would ask, “Do people actually enjoy using this?” or “How do these tactics reflect on our reputation and brand?”

Digital interface design is going through a bland period of sameness.

Data is only as good as the questions you ask

When to use data vs. when to use instinct

Deciding between two or three options? This is where data shines. Nothing is more decisive than an A/B test to compare potential solutions and see which one actually performs better. Make sure you’re measuring long-term value metrics and not just views and clicks.

Sweating product quality and aesthetics? Turn to your instinct. The overall feeling of quality is a collection of hundreds of micro-decisions, maintained consistency, and execution with accuracy. Each one of those decisions isn’t worth validating on its own. Your users aren’t design experts, so their feedback will be too subjective and variable. Trust your design senses when finessing the details.

Unsure about user behavior? Use data rather than asking for opinions. When asked what they’ll do, customers will do what they think you want them to. Instead, trust what they actually do when they think nobody’s looking.

Building brand and reputation? Data can’t easily measure this. But we all know trustworthiness is as important as clicks (and sometimes they’re opposing goals). When building long-term reputation, trust your instinct to guide you to what’s appealing, even if it sometimes contradicts short-term data trends. You have to play the long game here.

more on big data in this IMS blog


NLP – natural language processing; ACL – Association for Computational Linguistics (ACL 2019)

Major trends in NLP: a review of 20 years of ACL research

Janna Lipenkova, July 23, 2019

The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)

 Data: working around the bottlenecks

large data is inherently noisy. \In general, the more “democratic” the production channel, the dirtier the data – which means that more effort has to be spent on its cleaning. For example, data from social media will require a longer cleaning pipeline. Among others, you will need to deal with extravagancies of self-expression like smileys and irregular punctuation, which are normally absent in more formal settings such as scientific papers or legal contracts.

The other major challenge is the labeled data bottleneck

crowd-sourcing and Training Data as a Service (TDaaS). On the other hand, a range of automatic workarounds for the creation of annotated datasets have also been suggested in the machine learning community.

Algorithms: a chain of disruptions in Deep Learning

Neural Networks are the workhorse of Deep Learning (cf. Goldberg and Hirst (2017) for an introduction of the basic architectures in the NLP context). Convolutional Neural Networks have seen an increase in the past years, whereas the popularity of the traditional Recurrent Neural Network (RNN) is dropping. This is due, on the one hand, to the availability of more efficient RNN-based architectures such as LSTM and GRU. On the other hand, a new and pretty disruptive mechanism for sequential processing – attention – has been introduced in the sequence-to-sequence (seq2seq) model by Sutskever et al. (2014).

Consolidating various NLP tasks

the three “global” NLP development curves – syntax, semantics and context awareness
the third curve – the awareness of a larger context – has already become one of the main drivers behind new Deep Learning algorithms.

A note on multilingual research

Think of different languages as different lenses through which we view the same world – they share many properties, a fact that is fully accommodated by modern learning algorithms with their increasing power for abstraction and generalization.

Spurred by the global AI hype, the NLP field is exploding with new approaches and disruptive improvements. There is a shift towards modeling meaning and context dependence, probably the most universal and challenging fact of human language. The generalisation power of modern algorithms allows for efficient scaling across different tasks, languages and datasets, thus significantly speeding up the ROI cycle of NLP developments and allowing for a flexible and efficient integration of NLP into individual business scenarios.

European Data Sharing Space

“Towards a European Data Sharing Space” BDVA Position Paper

BDV Big Data Value Association

April, 2019. Position paper:

This position paper is meant to i) support the dialog among European and national policy makers, industry, research, public sector and civic society in the definition of a common roadmap for the development and adoption of a pan-European Data Sharing Space, and ii) guide public and private investments in this area in the next Multiannual Financial Framework.

data is the new oil in Industry 4.0

Why “data is the new oil” and what happens when energy meets Industry 4.0

By Nicholas Waller PUBLISHED 19:42 NOVEMBER 14, 2018

Why “data is the new oil” and what happens when energy meets Industry 4.0

At the Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC) this week, the UAE’s minister of state for Artificial Intelligence, Omar bin Sultan Al Olama, went so far as to declare that “Data is the new oil.”

according to Pulitzer Prize-winning author, economic historian and one of the world’s leading experts on the oil & gas sector; Daniel Yergin, there is now a “symbiosis” between energy producers and the new knowledge economy. The production of oil & gas and the generation of data are now, Yergin argues, “wholly inter-dependent”.

What does Oil & Gas 4.0 look like in practice?

the greater use of automation and collection of data has allowed an upsurge in the “de-manning” of oil & gas facilities

Thanks to a significant increase in the number of sensors being deployed across operations, companies can monitor what is happening in real time, which markedly improves safety levels.

in the competitive environment of the Fourth Industrial Revolution, no business can afford to be left behind by not investing in new technologies – so strategic discussions are important.

more on big data in this IMS blog

more on industry 4.0 in this IMS blog

Data Lake

What is a Data Lake? A Super-Simple Explanation For Anyone

September 6, 2018 Bernard Marr

James Dixon, the CTO of Pentaho is credited with naming the concept of a data lake. He uses the following analogy:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

A data lake holds data in an unstructured way and there is no hierarchy or organization among the individual pieces of data. It holds data in its rawest form—it’s not processed or analyzed. Additionally, a data lakes accepts and retains all data from all data sources, supports all data types and schemas (the way the data is stored in a database) are applied only when the data is ready to be used.

What is a data warehouse?

A data warehouse stores data in an organized manner with everything archived and ordered in a defined way. When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyze data sources and understand business processes.


Data lakes retain all data—structured, semi-structured and unstructured/raw data. It’s possible that some of the data in a data lake will never be used. Data lakes keep all data as well. A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.


Since a data lake lacks structure, it’s relatively easy to make changes to models and queries.


Data scientists are typically the ones who access the data in data lakes because they have the skill-set to do deep analysis.


Since data warehouses are more mature than data lakes, the security for data warehouses is also more mature.

more on big data in this IMS blog

Capitalism in the age of big data

#FakeNews #DigitalRecommendationEngines interpretation of data, market dependency “stupid smart recommendation engines” monopolistic structure, keep competitiveness, big data, market concentration

Reinventing Capitalism in the Age of Big Data (Basic Books / Hachette, 2018) by Viktor Mayer-Schönberger and Thomas Ramge.

more on this broad topic in this IMS blog:

and in the LIB 290 blog:

1 2 3 5