‘Anonymous’ browsing data can be easily exposed, researchers reveal
A similar strategy was used in 2008, Dewes said, to deanonymise a set of ratings published by Netflix to help computer scientists improve its recommendation algorithm: by comparing “anonymous” ratings of films with public profiles on IMDB, researchers were able to unmask Netflix users – including one woman, a closeted lesbian, who went on to sue Netflix for the privacy violation.
A hacker explains the best way to browse the internet anonymously.
more on privacy in this IMS blog
New Report Examines Use of Big Data in Ed
By Dian Schaffhauser 05/17/17
new report from the National Academy of Education “Big Data in Education,” summarizes the findings of a recent workshop held by the academy
three federal laws: Family Educational Rights and Privacy Act (FERPA), the Children’s Online Privacy Protection Act (COPPA) and the Protection of Pupil Rights Amendment (PPRA).
over the last four years, 49 states and the District of Columbia have introduced 410 bills related to student data privacy, and 36 states have passed 85 new education data privacy laws. Also, since 2014, 19 states have passed laws that in some way address the work done by researchers.
researchers need to get better at communicating about their projects, especially with non-researchers.
One approach to follow in gaining trust “from parents, advocates and teachers” uses the acronym CUPS:
- Collection: What data is collected by whom and from whom;
- Use: How the data will be used and what the purpose of the research is;
- Protection: What forms of data security protection are in place and how access will be limited; and
- Sharing: How and with whom the results of the data work will be shared.
Second, researchers must pin down how to share data without making it vulnerable to theft.
Third, researchers should build partnerships of trust and “mutual interest” pertaining to their work with data. Those alliances may involve education technology developers, education agencies both local and state, and data privacy stakeholders.
Along with the summary report, the results of the workshop are being maintained on a page within the Academy’s website here.
more on big data in education in this IMS blog
Call For Chapters: Responsible Analytics and Data Mining in Education: Global Perspectives on Quality, Support, and Decision-Making
SUBMIT A 1-2 PAGE CHAPTER PROPOSAL
Deadline – June 1, 2017
Title: Responsible Analytics and Data Mining in Education: Global Perspectives on Quality, Support, and Decision-Making
Due to rapid advancements in our ability to collect, process, and analyze massive amounts of data, it is now possible for educators at all levels to gain new insights into how people learn. According to Bainbridge, et. al. (2015), using simple learning analytics models, educators now have the tools to identify, with up to 80% accuracy, which students are at the greatest risk of failure before classes even begin. As we consider the enormous potential of data analytics and data mining in education, we must also recognize a myriad of emerging issues and potential consequences—intentional and unintentional—to implement them responsibly. For example:
· Who collects and controls the data?
· Is it accessible to all stakeholders?
· How are the data being used, and is there a possibility for abuse?
· How do we assess data quality?
· Who determines which data to trust and use?
· What happens when the data analysis yields flawed results?
· How do we ensure due process when data-driven errors are uncovered?
· What policies are in place to address errors?
· Is there a plan for handling data breaches?
This book, published by Routledge Taylor & Francis Group, will provide insights and support for policy makers, administrators, faculty, and IT personnel on issues pertaining the responsible use data analytics and data mining in education.
· June 1, 2017 – Chapter proposal submission deadline
· July 15, 2017 – Proposal decision notification
· October 15, 2017 – Full chapter submission deadline
· December 1, 2017 – Full chapter decision notification
· January 15, 2018 – Full chapter revisions due
more on data mining in this IMS blog
more on analytics in this IMS blog
Big Data като Big Success
Анализът на масивите данни може да помогне на редица бизнеси да решават проблеми и да намаляват загубите и пропуснатите ползи, твърди Александър Ефремов
more on big data in this IMS blog
Beyond the Horizon Webinar on Student Data
March 29, 2017 @ 12-1pm US Central Time
NMC Beyond the Horizon > Integrating Student Data Across Platforms
The growing use of data mining software in online education has great potential to support student success by identifying and reaching out to struggling students and streamlining the path to graduation. This can be a challenge for institutions that are using a variety of technology systems that are not integrated with each other. As institutions implement learning management systems, degree planning technologies, early alert systems, and tutor scheduling that promote increased interactions among various stakeholders, there is a need for centralized aggregation of these data to provide students with holistic support that improves learning outcomes. Join us to hear from an institutional exemplar who is building solutions that integrate student data across platforms. Then work with peers to address challenges and develop solutions of your own.
more on altmetrics in this IMS blog
more on big data in this IMS blog
Data Can Help Schools Confront ‘Chronic Absence’
By Dian Schaffhauser 09/22/16
The data shared in June by the Office for Civil Rights, which compiled it from a 2013-2014 survey completed by nearly every school district and school in the United States. new is a report from Attendance Works and the Everyone Graduates Center that encourages schools and districts to use their own data to pinpoint ways to take on the challenge of chronic absenteeism.
The first is research that shows that missing that much school is correlated with “lower academic performance and dropping out.” Second, it also helps in identifying students earlier in the semester in order to get a jump on possible interventions.
The report offers a six-step process for using data tied to chronic absence in order to reduce the problem.
The first step is investing in “consistent and accurate data.” That’s where the definition comes in — to make sure people have a “clear understanding” and so that it can be used “across states and districts” with school years that vary in length. The same step also requires “clarifying what counts as a day of attendance or absence.”
The second step is to use the data to understand what the need is and who needs support in getting to school. This phase could involve defining multiple tiers of chronic absenteeism (at-risk, moderate or severe), and then analyzing the data to see if there are differences by student sub-population — grade, ethnicity, special education, gender, free and reduced price lunch, neighborhood or other criteria that require special kinds of intervention.
Step three asks schools and districts to use the data to identify places getting good results. By comparing chronic absence rates across the district or against schools with similar demographics, the “positive outliers” may surface, showing people that the problem isn’t unstoppable but something that can be addressed for the better.
Steps five and six call on schools and districts to help people understand why the absences are happening, develop ways to address the problem.
The report links to free data tools on the Attendance Works website, including a calculator for tallying chronic absences and guidance on how to protect student privacy when sharing data.
The full report is freely available on the Attendance Works website.
more on big data in education in this IMS blog
Learn data mining languages: R, Python and SQL
– Fantastic set of interactive tutorials for learning different languages. Their SQL tutorial is second to none. You’ll learn how to manipulate data in MySQL, SQL Server, Access, Oracle, Sybase, DB2 and other database systems.
– The best way to learn is to work towards a goal. That’s what this helpful blog series is all about. You’ll learn SQL from scratch by following along with a simple, but common, data analysis scenario.
– This course is recommended for the intermediate SQL-er who wants to brush up on his/her skills. It’s a series of 10 challenges coupled with forums and external videos to help you improve your SQL knowledge and understanding of the underlying principles.
– Created by Code School, this interactive online tutorial system is designed to step you through R for statistics and data modeling. As you work through their seven modules, you’ll earn badges to track your progress helping you to stay on track.
– If you’re a complete R novice, try Lead’s introduction to R. In their 1 hour 30 min course, they’ll cover installation, basic usage, common functions, data structures, and data types. They’ll even set you up with your own development environment in RStudio.
– Once you’ve mastered the basics of R, bookmark this page. It’s a fantastically comprehensive style guide to using R. We should all strive to write beautiful code, and this resource (based on Google’s R style guide) is your key to that ideal.
– Learn R in R – a radical idea certainly. But that’s exactly what Swirl does. They’ll interactively teach you how to program in R and do some basic data science at your own pace. Right in the R console.
Python for beginners
– The Python website actually has a pretty comprehensive and easy-to-follow set of tutorials. You can learn everything from installation to complex analyzes. It also gives you access to the Python community, who will be happy to answer your questions.
– A complete list of Python tutorials to take you from zero to Python hero. There are tutorials for beginners, intermediate and advanced learners.
Read all about it: data mining books
Data Jujitsu: The Art of Turning Data into Product
– This free book by DJ Patil gives you a brief introduction to the complexity of data problems and how to approach them. He gives nice, understandable examples that cover the most important thought processes of data mining. It’s a great book for beginners but still interesting to the data mining expert. Plus, it’s free!
Data Mining: Concepts and Techniques
– The third (and most recent) edition will give you an understanding of the theory and practice of discovering patterns in large data sets. Each chapter is a stand-alone guide to a particular topic, making it a good resource if you’re not into reading in sequence or you want to know about a particular topic.
Mining of Massive Datasets
– Based on the Stanford Computer Science course, this book is often sighted by data scientists as one of the most helpful resources around. It’s designed at the undergraduate level with no formal prerequisites. It’s the next best thing to actually going to Stanford!
Hadoop: The Definitive Guide
– As a data scientist, you will undoubtedly be asked about Hadoop. So you’d better know how it works. This comprehensive guide will teach you how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. Make sure you get the most recent addition to keep up with this fast-changing service.
Online learning: data mining webinars and courses
– Learn data mining from the comfort of your home with DataCamp’s online courses. They have free courses on R, Statistics, Data Manipulation, Dynamic Reporting, Large Data Sets and much more.
– Coursera brings you all the best University courses straight to your computer. Their online classes will teach you the fundamentals of interpreting data, performing analyzes and communicating insights. They have topics for beginners and advanced learners in Data Analysis, Machine Learning, Probability and Statistics and more.
– With a range of free and pay for data mining courses, you’re sure to find something you like on Udemy no matter your level. There are 395 in the area of data mining! All their courses are uploaded by other Udemy users meaning quality can fluctuate so make sure you read the reviews.
– These courses are handily organized into “Paths” based on the technology you want to learn. You can do everything from build a foundation in Git to take control of a data layer in SQL. Their engaging online videos will take you step-by-step through each lesson and their challenges will let you practice what you’ve learned in a controlled environment.
– Master a new skill or programming language with Udacity’s unique series of online courses and projects. Each class is developed by a Silicon Valley tech giant, so you know what your learning will be directly applicable to the real world.
– Learn from experts in web design, coding, business and more. The video tutorials from Treehouse will teach you the basics and their quizzes and coding challenges will ensure the information sticks. And their UI is pretty easy on the eyes.
Learn from the best: top data miners to follow
– Chief Data Scientist at MailChimp and author of Data Smart, John is worth a follow for his witty yet poignant tweets on data science.
– Author and Chief Data Scientist at The White House OSTP, DJ tweets everything you’ve ever wanted to know about data in politics.
– He’s Editor-in-Chief of FiveThirtyEight, a blog that uses data to analyze news stories in Politics, Sports, and Current Events.
– As the Chief Data Scientist at Baidu, Andrew is responsible for some of the most groundbreaking developments in Machine Learning and Data Science.
– He might know pretty much everything there is to know about Big Data.
– He’s the author of popular data science blog KDNuggets
, the leading newsletter on data mining and knowledge discovery.
– As the Co-founder of OKCupid, Christian has access to one of the most unique datasets on the planet and he uses it to give fascinating insight into human nature, love, and relationships
– He’s contributed to a number of data blogs and authored his own book on Applied Predictive Analytics. At the moment, Dean is Chief Data Scientist at SmarterHQ
Practice what you’ve learned: data mining competitions
– This is the ultimate data mining competition. The world’s biggest corporations offer big prizes for solving their toughest data problems.
– The best way to learn is to teach. Stackoverflow offers the perfect forum for you to prove your data mining know-how by answering fellow enthusiast’s questions.
– With a live leaderboard and interactive participation, TunedIT offers a great platform to flex your data mining muscles.
– You can find a number of nonprofit data mining challenges on DataDriven. All of your mining efforts will go towards a good cause.
– Another great site to answer questions on just about everything. There are plenty of curious data lovers on there asking for help with data mining and data science.
Meet your fellow data miner: social networks, groups and meetups
– As with many social media platforms, Facebook is a great place to meet and interact with people who have similar interests. There are a number of very active data mining groups you can join.
– If you’re looking for data mining experts in a particular field, look no further than LinkedIn. There are hundreds of data mining groups ranging from the generic to the hyper-specific. In short, there’s sure to be something for everyone.
– Want to meet your fellow data miners in person? Attend a meetup! Just search for data mining in your city and you’re sure to find an awesome group near you.
8 fantastic examples of data storytelling
8 fantastic examples of data storytelling
Data storytelling is the realization of great data visualization. We’re seeing data that’s been analyzed well and presented in a way that someone who’s never even heard of data science can get it.
Google’s Cole Nussbaumer provides a friendly reminder of what data storytelling actually is, it’s straightforward, strategic, elegant, and simple.
more on text and data mining in this IMS blog
The EU just told data mining startups to take their business elsewhere
By enabling the development and creation of big data for non-commercial use only, the European Commission has come up with a half-baked policy. Startups will be discouraged from mining in Europe and it will be impossible for companies to grow out of universities in the EU.
more on copyright and text and data mining in this IMS blog