Searching for "Big Data"

Data Lake

What is a Data Lake? A Super-Simple Explanation For Anyone

September 6, 2018 Bernard Marr

https://www.linkedin.com/pulse/what-data-lake-super-simple-explanation-anyone-bernard-marr/

James Dixon, the CTO of Pentaho is credited with naming the concept of a data lake. He uses the following analogy:

“If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.”

A data lake holds data in an unstructured way and there is no hierarchy or organization among the individual pieces of data. It holds data in its rawest form—it’s not processed or analyzed. Additionally, a data lakes accepts and retains all data from all data sources, supports all data types and schemas (the way the data is stored in a database) are applied only when the data is ready to be used.

What is a data warehouse?

A data warehouse stores data in an organized manner with everything archived and ordered in a defined way. When a data warehouse is developed, a significant amount of effort occurs during the initial stages to analyze data sources and understand business processes.

Data

Data lakes retain all data—structured, semi-structured and unstructured/raw data. It’s possible that some of the data in a data lake will never be used. Data lakes keep all data as well. A data warehouse only includes data that is processed (structured) and only the data that is necessary to use for reporting or to answer specific business questions.

Agility

Since a data lake lacks structure, it’s relatively easy to make changes to models and queries.

Users

Data scientists are typically the ones who access the data in data lakes because they have the skill-set to do deep analysis.

Security

Since data warehouses are more mature than data lakes, the security for data warehouses is also more mature.

+++++++++++++++
more on big data in this IMS blog
https://blog.stcloudstate.edu/ims?s=big+data

Borgman data

book reviews:
https://bobmorris.biz/big-data-little-data-no-data-a-book-review-by-bob-morris
“The challenge is to make data discoverable, usable, assessable, intelligible, and interpretable, and do so for extended periods of time…To restate the premise of this book, the value of data lies in their use. Unless stakeholders can agree on what to keep and why, and invest in the invisible work necessary to sustain knowledge infrastructures, big data and little data alike will become no data.”
http://www.cjc-online.ca/index.php/journal/article/view/3152/3337
he premise that data are not natural objects with their own essence, Borgman rather explores the different values assigned to them, as well as their many variations according to place, time, and the context in which they are collected. It is specifically through six “provocations” that she offers a deep engagement with different aspects of the knowledge industry. These include the reproducibility, sharing, and reuse of data; the transmission and publication of knowledge; the stability of scholarly knowledge, despite its increasing proliferation of forms and modes; the very porosity of the borders between different areas of knowledge; the costs, benefits, risks, and responsibilities related to knowledge infrastructure; and finally, investment in the sustainable acquisition and exploitation of data for scientific research.
beyond the six provocations, there is a larger question concerning the legitimacy, continuity, and durability of all scientific research—hence the urgent need for further reflection, initiated eloquently by Borgman, on the fact that “despite the media hyperbole, having the right data is usually better than having more data”
o Data management (Pages xviii-xix)
o Data definition (4-5 and 18-29)
p. 5 big data and little data are only awkwardly analogous to big science and little science. Modern science, or big science inDerek J. de Solla Price  (https://en.wikipedia.org/wiki/Big_Science) is characterized by international, collaborative efforts and by the invisible colleges of researchers who know each other and who exchange information on a formal and informal basis. Little science is the three hundred years of independent, smaller-scale work to develop theory and method for understanding research problems. Little science is typified by heterogeneous methods, heterogeneous data and by local control and analysis.
p. 8 The Long Tail
a popular way of characterizing the availability and use of data in research areas or in economic sectors. https://en.wikipedia.org/wiki/Long_tail

o Provocations (13-15)
o Digital data collections (21-26)
o Knowledge infrastructures (32-35)
o Open access to research (39-42)
o Open technologies (45-47)
o Metadata (65-70 and 79-80)
o Common resources in astronomy (71-76)
o Ethics (77-79)
o Research Methods and data practices, and, Sensor-networked science and technology (84-85 and 106-113)
o Knowledge infrastructures (94-100)
o COMPLETE survey (102-106)
o Internet surveys (128-143)
o Internet survey (128-143)
o Twitter (130-133, 138-141, and 157-158(
o Pisa Clark/CLAROS project (179-185)
o Collecting Data, Analyzing Data, and Publishing Findings (181-184)
o Buddhist studies 186-200)
o Data citation (241-268)
o Negotiating authorship credit (253-256)
o Personal names (258-261)
o Citation metrics (266-209)
o Access to data (279-283)

++++++++++++++++
more on big data in education in this IMS blog
https://blog.stcloudstate.edu/ims?s=big+data

student data mining

Beyond the Horizon Webinar on Student Data

March 29, 2017 @ 12-1pm US Central Time

NMC Beyond the Horizon > Integrating Student Data Across Platforms

The growing use of data mining software in online education has great potential to support student success by identifying and reaching out to struggling students and streamlining the path to graduation. This can be a challenge for institutions that are using a variety of technology systems that are not integrated with each other. As institutions implement learning management systems, degree planning technologies, early alert systems, and tutor scheduling that promote increased interactions among various stakeholders, there is a need for centralized aggregation of these data to provide students with holistic support that improves learning outcomes. Join us to hear from an institutional exemplar who is building solutions that integrate student data across platforms. Then work with peers to address challenges and develop solutions of your own.

+++++++++++++++++++++++
more on altmetrics in this IMS blog
https://blog.stcloudstate.edu/ims?s=altmetrics

more on big data in this IMS blog
https://blog.stcloudstate.edu/ims?s=big+data

bid data and school abscence

Data Can Help Schools Confront ‘Chronic Absence’

By Dian Schaffhauser 09/22/16

https://thejournal.com/articles/2016/09/22/data-can-help-schools-confront-chronic-absence.aspx

The data shared in June by the Office for Civil Rights, which compiled it from a 2013-2014 survey completed by nearly every school district and school in the United States. new is a report from Attendance Works and the Everyone Graduates Center that encourages schools and districts to use their own data to pinpoint ways to take on the challenge of chronic absenteeism.

The first is research that shows that missing that much school is correlated with “lower academic performance and dropping out.” Second, it also helps in identifying students earlier in the semester in order to get a jump on possible interventions.

The report offers a six-step process for using data tied to chronic absence in order to reduce the problem.

The first step is investing in “consistent and accurate data.” That’s where the definition comes in — to make sure people have a “clear understanding” and so that it can be used “across states and districts” with school years that vary in length. The same step also requires “clarifying what counts as a day of attendance or absence.”

The second step is to use the data to understand what the need is and who needs support in getting to school. This phase could involve defining multiple tiers of chronic absenteeism (at-risk, moderate or severe), and then analyzing the data to see if there are differences by student sub-population — grade, ethnicity, special education, gender, free and reduced price lunch, neighborhood or other criteria that require special kinds of intervention.

Step three asks schools and districts to use the data to identify places getting good results. By comparing chronic absence rates across the district or against schools with similar demographics, the “positive outliers” may surface, showing people that the problem isn’t unstoppable but something that can be addressed for the better.

Steps five and six call on schools and districts to help people understand why the absences are happening, develop ways to address the problem.

The report links to free data tools on the Attendance Works website, including a calculator for tallying chronic absences and guidance on how to protect student privacy when sharing data.

The full report is freely available on the Attendance Works website.

++++++++++++++
more on big data in education in this IMS blog
https://blog.stcloudstate.edu/ims?s=data

text and data mining

38 great resources for learning data mining concepts and techniques

http://www.rubedo.com.br/2016/08/38-great-resources-for-learning-data.html

Learn data mining languages: R, Python and SQL

W3Schools – Fantastic set of interactive tutorials for learning different languages. Their SQL tutorial is second to none. You’ll learn how to manipulate data in MySQL, SQL Server, Access, Oracle, Sybase, DB2 and other database systems.
Treasure Data – The best way to learn is to work towards a goal. That’s what this helpful blog series is all about. You’ll learn SQL from scratch by following along with a simple, but common, data analysis scenario.
10 Queries – This course is recommended for the intermediate SQL-er who wants to brush up on his/her skills. It’s a series of 10 challenges coupled with forums and external videos to help you improve your SQL knowledge and understanding of the underlying principles.
TryR – Created by Code School, this interactive online tutorial system is designed to step you through R for statistics and data modeling. As you work through their seven modules, you’ll earn badges to track your progress helping you to stay on track.
Leada – If you’re a complete R novice, try Lead’s introduction to R. In their 1 hour 30 min course, they’ll cover installation, basic usage, common functions, data structures, and data types. They’ll even set you up with your own development environment in RStudio.
Advanced R – Once you’ve mastered the basics of R, bookmark this page. It’s a fantastically comprehensive style guide to using R. We should all strive to write beautiful code, and this resource (based on Google’s R style guide) is your key to that ideal.
Swirl – Learn R in R – a radical idea certainly. But that’s exactly what Swirl does. They’ll interactively teach you how to program in R and do some basic data science at your own pace. Right in the R console.
Python for beginners – The Python website actually has a pretty comprehensive and easy-to-follow set of tutorials. You can learn everything from installation to complex analyzes. It also gives you access to the Python community, who will be happy to answer your questions.
PythonSpot – A complete list of Python tutorials to take you from zero to Python hero. There are tutorials for beginners, intermediate and advanced learners.
Read all about it: data mining books
Data Jujitsu: The Art of Turning Data into Product – This free book by DJ Patil gives you a brief introduction to the complexity of data problems and how to approach them. He gives nice, understandable examples that cover the most important thought processes of data mining. It’s a great book for beginners but still interesting to the data mining expert. Plus, it’s free!
Data Mining: Concepts and Techniques – The third (and most recent) edition will give you an understanding of the theory and practice of discovering patterns in large data sets. Each chapter is a stand-alone guide to a particular topic, making it a good resource if you’re not into reading in sequence or you want to know about a particular topic.
Mining of Massive Datasets – Based on the Stanford Computer Science course, this book is often sighted by data scientists as one of the most helpful resources around. It’s designed at the undergraduate level with no formal prerequisites. It’s the next best thing to actually going to Stanford!
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners – This book is a must read for anyone who needs to do applied data mining in a business setting (ie practically everyone). It’s a complete resource for anyone looking to cut through the Big Data hype and understand the real value of data mining. Pay particular attention to the section on how modeling can be applied to business decision making.
Data Smart: Using Data Science to Transform Information into Insight – The talented (and funny) John Foreman from MailChimp teaches you the “dark arts” of data science. He makes modern statistical methods and algorithms accessible and easy to implement.
Hadoop: The Definitive Guide – As a data scientist, you will undoubtedly be asked about Hadoop. So you’d better know how it works. This comprehensive guide will teach you how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. Make sure you get the most recent addition to keep up with this fast-changing service.
 Online learning: data mining webinars and courses
DataCamp – Learn data mining from the comfort of your home with DataCamp’s online courses. They have free courses on R, Statistics, Data Manipulation, Dynamic Reporting, Large Data Sets and much more.
Coursera – Coursera brings you all the best University courses straight to your computer. Their online classes will teach you the fundamentals of interpreting data, performing analyzes and communicating insights. They have topics for beginners and advanced learners in Data Analysis, Machine Learning, Probability and Statistics and more.
Udemy – With a range of free and pay for data mining courses, you’re sure to find something you like on Udemy no matter your level. There are 395 in the area of data mining! All their courses are uploaded by other Udemy users meaning quality can fluctuate so make sure you read the reviews.
CodeSchool – These courses are handily organized into “Paths” based on the technology you want to learn. You can do everything from build a foundation in Git to take control of a data layer in SQL. Their engaging online videos will take you step-by-step through each lesson and their challenges will let you practice what you’ve learned in a controlled environment.
Udacity – Master a new skill or programming language with Udacity’s unique series of online courses and projects. Each class is developed by a Silicon Valley tech giant, so you know what your learning will be directly applicable to the real world.
Treehouse – Learn from experts in web design, coding, business and more. The video tutorials from Treehouse will teach you the basics and their quizzes and coding challenges will ensure the information sticks. And their UI is pretty easy on the eyes.
Learn from the best: top data miners to follow
John Foreman – Chief Data Scientist at MailChimp and author of Data Smart, John is worth a follow for his witty yet poignant tweets on data science.
DJ Patil – Author and Chief Data Scientist at The White House OSTP, DJ tweets everything you’ve ever wanted to know about data in politics.
Nate Silver – He’s Editor-in-Chief of FiveThirtyEight, a blog that uses data to analyze news stories in Politics, Sports, and Current Events.
Andrew Ng – As the Chief Data Scientist at Baidu, Andrew is responsible for some of the most groundbreaking developments in Machine Learning and Data Science.
Bernard Marr – He might know pretty much everything there is to know about Big Data.
Gregory Piatetsky – He’s the author of popular data science blog KDNuggets, the leading newsletter on data mining and knowledge discovery.
Christian Rudder – As the Co-founder of OKCupid, Christian has access to one of the most unique datasets on the planet and he uses it to give fascinating insight into human nature, love, and relationships
Dean Abbott – He’s contributed to a number of data blogs and authored his own book on Applied Predictive Analytics. At the moment, Dean is Chief Data Scientist at SmarterHQ.
Practice what you’ve learned: data mining competitions
Kaggle – This is the ultimate data mining competition. The world’s biggest corporations offer big prizes for solving their toughest data problems.
Stack Overflow – The best way to learn is to teach. Stackoverflow offers the perfect forum for you to prove your data mining know-how by answering fellow enthusiast’s questions.
TunedIT – With a live leaderboard and interactive participation, TunedIT offers a great platform to flex your data mining muscles.
DrivenData – You can find a number of nonprofit data mining challenges on DataDriven. All of your mining efforts will go towards a good cause.
Quora – Another great site to answer questions on just about everything. There are plenty of curious data lovers on there asking for help with data mining and data science.
Meet your fellow data miner: social networks, groups and meetups
Reddit – Reddit is a forum for finding the latest articles on data mining and connecting with fellow data scientists. We recommend subscribing to r/dataminingr/dataisbeautiful,r/datasciencer/machinelearning and r/bigdata.
Facebook – As with many social media platforms, Facebook is a great place to meet and interact with people who have similar interests. There are a number of very active data mining groups you can join.
LinkedIn – If you’re looking for data mining experts in a particular field, look no further than LinkedIn. There are hundreds of data mining groups ranging from the generic to the hyper-specific. In short, there’s sure to be something for everyone.
Meetup – Want to meet your fellow data miners in person? Attend a meetup! Just search for data mining in your city and you’re sure to find an awesome group near you.
——————————

8 fantastic examples of data storytelling

8 fantastic examples of data storytelling

Data storytelling is the realization of great data visualization. We’re seeing data that’s been analyzed well and presented in a way that someone who’s never even heard of data science can get it.

Google’s Cole Nussbaumer provides a friendly reminder of what data storytelling actually is, it’s straightforward, strategic, elegant, and simple.

 

++++++++++++++++++++++

more on text and data mining in this IMS blog
hthttps://blog.stcloudstate.edu/ims?s=data+mining

European Commission and text and data mining

The EU just told data mining startups to take their business elsewhere

Lenard Koschwitz

By enabling the development and creation of big data for non-commercial use only, the European Commission has come up with a half-baked policy. Startups will be discouraged from mining in Europe and it will be impossible for companies to grow out of universities in the EU.

++++++++++++++++++

more on copyright and text and data mining in this IMS blog
https://blog.stcloudstate.edu/ims?s=copyrig
hthttps://blog.stcloudstate.edu/ims?s=data+mining

how teachers use data

The Three Ways Teachers Use Data—and What Technology Needs to Do Better

By Karen Johnson May 17, 2016

https://www.edsurge.com/news/2016-05-17-the-three-ways-teachers-use-data-and-what-technology-can-do-better

After surveying more than 4,650 educators, we learned that teachers are essentially trying to do three things with data—each of which technology can dramatically improve:

1. Assess

2. Analyze

3. Pivot

+++++++++++++++++++++++++++++

What’s At Risk When Schools Focus Too Much on Student Data?

What’s At Risk When Schools Focus Too Much on Student Data?

The U.S. Department of Education has increasingly encouraged and funded states to collect and analyze information about students: grades, state test scores, attendance, behavior, lateness, graduation rates and school climate measures like surveys of student engagement.

The argument in favor of all this is that the more we know about how students are doing, the better we can target instruction and other interventions. And sharing that information with parents and the community at large is crucial. It can motivate big changes.

what might be lost when schools focus too much on data. Here are five arguments against the excesses of data-driven instruction.

1) Motivation stereotype threat.

it could create negative feelings about school, threatening students’ sense of belonging, which is key to academic motivation.

2) Helicoptering

Today, parents increasingly are receiving daily text messages with photos and videos from the classroom. A style of overly involved “intrusive parenting” has been associated in studies with increased levels of anxiety and depression when students reach college. “Parent portals as utilized in K-12 education are doing significant harm to student development,” argues college instructor John Warner in a recent piece for Inside Higher Ed.

3) Commercial Monitoring and Marketing

The National Education Policy Center releases annual reports on commercialization and marketing in public schools. In its most recent report in May, researchers there raised concerns about targeted marketing to students using computers for schoolwork and homework. Companies like Google pledge not to track the content of schoolwork for the purposes of advertising. But in reality these boundaries can be a lot more porous. For example, a high school student profiled in the NEPC report often consulted commercial programs like dictionary.com and Sparknotes: “Once when she had been looking at shoes, she mentioned, an ad for shoes appeared in the middle of a Sparknotes chapter summary.”

4) Missing What Data Can’t Capture

Computer systems are most comfortable recording and analyzing quantifiable, structured data. The number of absences in a semester, say; or a three-digit score on a multiple-choice test that can be graded by machine, where every question has just one right answer.

5) Exposing Students’ “Permanent Records”

In the past few years several states have passed laws banning employers from looking at the credit reports of job applicants. Employers want people who are reliable and responsible. But privacy advocates argue that a past medical issue or even a bankruptcy shouldn’t unfairly dun a person who needs a fresh start.

++++++++++++++++++++++++++++
more on big data in education in this blog:
https://blog.stcloudstate.edu/ims?s=big+data+education

57 Jobs of the Future

57 Jobs of the Future

Metaverse Jobs

  1. Metaverse World Designers
  2. Avatar Designers
  3. Metaverse Storefront Creators, Developers, and Operators
  4. Metaverse Law Enforcement
  5. DAO Attorneys

Cryptocurrency

  1. Crypto Coaches and Advisors
  2. Crypto Mortgage Specialists
  3. Decentralization Managers

Healthcare

  1. Amnesia Surgeons – Doctors who are skilled in removing bad memories or destructive behavior.
  2. Memory Augmentation Therapists – Entertainment is all about the great memories it creates. Creating a better grade of memories can dramatically change who we are and pave the way for an entirely new class of humans.
  3. Digital Implant Architects
  4. Genetic Troubleshooters
  5. Body Part Fabricators
  6. AI Health Managers

Big Data

  1. Privacy Strategists
  2. Personal Data Managers, Archivists, and Protectors
  3. Blockchain Designers
  4. Vulnerabilities Analyst

Future Education

  1. AI Memory Assessment Engineers
  2. AI Coach-Bot Designers
  3. AI Teacher-Bot Developers

 

Privacy and Safety in Remote Learning Environments

BLEND-ONLINE : Call for Chapter Proposals– Privacy and Remote Learning

Digital Scholarship Initiatives at Middle Tennessee State University invites you to propose a chapter for our forthcoming book.

Working book title: Privacy and Safety in Remote Learning Environments

Proposal submission deadline: January 21, 2022

Interdisciplinary perspectives are highly encouraged

Topics may include but are not limited to:

  • Privacy policies of 3rd party EdTech platforms (Google Classroom, Microsoft Teams, Schoology, etc)
  • Parental “spying” and classroom privacy
  • Family privacy and synchronous online schooling
  • Online harassment among students (private chats, doxing, social media, etc)
  • Cameras in student private spaces
  • Surveillance of student online activities
  • Exam proctoring software and privacy concerns
  • Personally Identifiable Information in online learning systems and susceptibility to cybercriminals
  • Privacy, storage, and deletion policies for recordings and data
  • Handling data removal requests from students
  • Appointing a privacy expert in schools, universities, or districts
  • How and why to perform security/privacy audits
  • Student attitudes about online privacy
  • Instructor privacy/safety concerns
  • Libraries: privacy policies of ebook platforms
  • Libraries: online reference services and transcripts
  • Identity authentication best practices
  • Learning analytics and “big data” in higher education

More details, timelines, and submission instructions are available at dsi.mtsu.edu/cfpBook2022

Digital Humanities for Librarians

Digital Humanities for Librarians

By: Emma Annette Wilson

  • Publisher: Rowman & Littlefield Publishers
  • Print ISBN: 9781538116449, 1538116448

Digital Humanities For Librarians. Some librarians are born to digital humanities; some aspire to digital humanities; and some have digital humanities thrust upon them. Digital Humanities For Librarians is a one-stop resource for librarians and LIS students working in this growing new area of academic librarianship. The book begins by introducing digital humanities, addressing key questions such as, “What is it?”, “Who does it?”, “How do they do it?”, “Why do they do it?”, and “How can I do it?”. This broad overview is followed by a series of practical chapters answering those questions with step-by-step approaches to both the digital and the human elements of digital humanities librarianship. Digital Humanities For Librarians covers a wide range of technologies currently used in the field, from creating digital exhibits, archives, and databases, to digital mapping, text encoding, and computational text analysis (big data for the humanities). However, the book never loses sight of the all-important human component to digital humanities work, and culminates in a series of chapters on management and personnel strategies in this area. These chapters walk readers through approaches to project management, effective collaboration, outreach, the reference interview for digital humanities, sustainability, and data management, making this a valuable resource for administrators as well as librarians directly involved in digital humanities work. There is also a consideration of budgeting questions, including strategies for supporting digital humanities work on a shoestring. Special features include: Case studies of a wide range of projects and management issues Digital instructional documents guiding readers through specific digital technologies and techniques An accompanying website featuring digital humanities tools and resources and digital interviews with librarians and scholars leading the way in digital humanities work across North America, from a range of larger and smaller institutionsWhether you are a librarian primarily working in digital humanities for the first time, a student hoping to do so, or a librarian in a cognate area newly-charged with these responsibilities, Digital Humanities For Librarians will be with you every step of the way, drawing on the author’s experiences and those of a network of librarians and scholars to give you the practical support and guidance needed to bring your digital humanities initiatives to life.

1 2 3 4 5 6 22