Monday, July 29, 2013

Is It Time To Go Big? The Case for Big Data in Student Affairs

Big data gets a bad wrap sometimes. The vast amount of information that exists about a person often raises the specter of   Big Brother in ways that were not even possible just a decade ago. In fact, it seems as if everywhere we turn, our online presence exists solely to expose us to others. The government seems dead set on hoarding that information to prosecute you, Google wants to aggregate it so they can be your advertising portal, and Facebook seems intent on exploiting your data to sell wholesale to other marketing firms.

But can this explosion of information be useful? Moreover, can we use it to promote student development and success in higher education? The more I consider the scope of the approach, the more I am coming around to believing that the Big Data philosophy is beneficial for higher education for a number of reason.

What is Big Data?

Big Data, broadly defined, is an approach to data analysis that can only be done on a big scale to "extract new insights or create new forms of value in ways that change...organizations and institutions." (Mayer-Schonberger & Cukier, 2013, p.6). In many ways, it is as much a philosophy and practice as it is a methodology.

Google is perhaps the biggest advocate and practitioner of Big Data. Its search analytics depend up consuming massive amounts of data to provide the most efficient, effective, and useful results to large groups of people who themselves may not know exactly what they are searching for.

In practice though, it has far reaching implications for assessment and proactive programming. Borrowing from the book "Big Data: A Revolution That Will Transform How We Live, Work, and Think" I would reference an experiment that Google ran to compare their analytics against that of the Center for Disease Control. In years past, the CDC has used a series of forms, surveys, and reports from hospitals, clinics, and individual doctors from around the country to predict what strains and how severe flu seasons will be. They use that data to order flu vaccines and send supplies where they might be needed most. This can take weeks at best to process and consumes significant people power at the CDC.

Google had an alternative theory. They would use the massive numbers of searches to try and predict flu outbreaks in near real time. For the past two years, the Google model matched the CDC's own internal predictive model but surpassing its time. While this year's results were not as effective, many experts remain hopeful with continued tweaking of algorithms to adapt to public fears, the predictive model will once again prove accurate.

In my mind, what makes the Google model particularly useful is that when they first released their flu tracking model, they do not not ask "why" things happen but specifically look at "what" things happen. By aggregating their data they could start with nearly 45 different variables and ultimately settled on 16 different search terms which had the highest correlation of actual flu outbreak.

But that is a fundamental shift in terms of philosophy for higher education. Being intentional is not just concerned with WHAT you are doing but HOW you are doing it and WHY you are doing it. Big Data, on its face, appears to reject the intentionality that has been a cornerstone of higher education and student affairs.

In many ways, the field of education is already wrestling with this fundamental question when it comes to holding institutions accountable for their education. In High Schools and Higher Education alike, public officials have pushed for strict measures of accountability that look at loan repayment, mean incomes of graduates, graduation and persistence rates, as well as standardized test scores. The problems of these metrics are plethora though.The Educational Policy Institute indicates that there are a significant number of factors that have adverse impacts on student learning. factors ranging from peer interactions to family support, to the physical conditions of schools all have significant impacts on students, but not every student is affected in exactly the same way. On the other hand, there are no universally accepted ideas about what constitutes necessary skills and content mastery necessary for individual success. Only broadly defined terms that can be manifested in different ways by different students.

Big Data, as a philosophy, can help bridge that gap since it doesn't rely on one standard metric of success. Moreover, our capacity to obtaining data and retaining data is significantly cheaper and more accessible than it ever has been at any other time. Even further, Big Data has already demonstrated success in certain areas of higher education.

Big Data and Higher Education

Big Data, or the collection of massive amount of data to sift through, is increasingly gaining momentum in higher education. From a purely educational philosophy, it is most pervasive in the "crowd sourcing" of knowledge. On one hand, crowd sourcing has been successful in identifying new galaxies culled from images collected by thousands of telescopes. On the other hand, crowd sourcing has allowed for a large collection of literary works to be shared by a great many students However, it has developed enthusiasts and results in other areas of higher education specifically when it comes to bolstering persistence and retention efforts.

EBI-MAPWORKS is one such initiative that aggregates massive amounts of data and allows every day professionals to, at a glance, determine a student's risk of dropping out of an institution. The program begins with a pretty significant first year survey and then factors in a wide variety of variables including demographic information, high school grades, test scores, and other incidents that happen on campus. These factors include student activities involvement, residency status, financial status and confidence, academic skills, social worries, and overall academic goals and dedication to completion.

MAP Works goes one step further though in that it allows a wide variety of faculty, staff, and students (specifically residence assistants) to input notes and track number of contacts between the University and its students. In this way, nearly every contact a student has becomes one more "data point" with which to evaluate their overall experience with the University.

It is the MAP Works philosophy that intrigues me the most about changing the way that student affairs and higher education pursues assessment initiatives. In the face of rising costs, the public has demanded a quick and easy way to "hold educators accountable" for the sizable investment that both the public and private sphere is making in an individual's experience. In my mind, it bridges the gap between the desire for a raw "beneficial or not beneficial" response of today's policy makers with the need to meet an ever growing diversity of experiences in higher education that are hard to encapsulate with a standardized test or as lengthy as waiting six years to see whether a student graduates and what they do with their education.

Big Data, as a practice, also fundamentally recognizes the intrinsic value that education holds by giving us new tools to measure growth in ways that we haven't been able to before. But it requires a change of philosophy that isn't necessarily intuitive.

1. We will never know with absolute truth the full experience of every student - Students as individuals have a plethora of different experiences and come from very different backgrounds. Different students affect their backgrounds in different ways. Elisa Abes writes in her theory about intersecting identities that salient identities can change over time, are often socially contextual, and can have differing impacts at different times in a person's life. Quantitative analysis has attempted to create constructs and isolate for a vast majority of variables, but the social fabric of our universities are constantly in flux and the time it takes to construct theory to practice models often times creates the false sense that theory (as represented by the study of higher education) and practice (as implemented by student affairs professionals) are two very different things.

Student affairs professionals have responded by flooding campuses with quick surveys, but with any survey, you make intentional decisions about what questions to add and what to leave off in order to produce the most useful information without creating the widely-recognized "survey fatigue" effect from students.

2. Intentionality is not the end all, be all of assessment - this is truly counterintuitive. When time, space, and effort are precious commodities, haphazard data collection is seen as the enemy of best practices. The Big Data approach does not reject implicitly the need to be intentional in questions we ask, but rather we need to ask more questions, more often.

As I said earlier, data facilities are becoming much cheaper than they once were and will likely decrease in cost as technology gets more advanced. Today, our ability to store electronic data would allow us to digitally encode all of the written literature in human history roughly ten times over without much concern.

The big concern is what do with so much data. As the Google Flu predictors shows us, so long as we can identify the right variables to search for, our predictions and assessments can be pretty accurate without the massive time and effort commitment of other methods. However, the trends are constantly changing, so our search parameters must also, lest our predictions are off base. In this way, the study of higher education and student affairs goes hand in hand as we constantly implement theory in our measurements

3. Conceptions of privacy will have to change - Right now, Big Data does feel a lot like big brother. Specifically, there are crucial elements of data that are legally prohibited from being collected and shared with other staff at a University. This is perhaps the biggest road block to adopting a true Big Data approach to higher education. FERPA is a big deal for a variety of reasons and one I do not wish to challenge lightly. However, there are some compelling reasons to at least reconceptualize what FERPA looks like in today's institutions.

First, the very concept of privacy is changing in the United States. If there is one thing that came out of the exposure of the National Security Agency's metadata collection (a technique already being used by the Post Office  and the FBI regularly trolls internet searches to identify problematic trends) it is that the American Public is not as concerned with internet privacy as it once seems.

Students in particular have exposed themselves in ways that were nearly impossible a decade ago. The rise of Facebook, Twitter, and Tumblr have created a new public square that is not limited to physical presence and speech. Foursquare, Instagram, and Vine have also pushed the boundaries of fundamental conceptions of privacy as we become more and more willing to put ourselves over a medium that is not fundamentally secure.

This is not to say that we have given up on the idea of privacy, since privacy locks are still a major consumer demand. The shift is in the base level concern for creating a digital footprint. At some point in time, these services are all fundamentally designed to be SHARED.

The only question is SHARED with who.

I would argue that when it comes to academic success and tracking the development of students, institutions of higher education have a compelling reason to know as much about their student populations as possible.

Which is NOT to say that disciplinary boards have a compelling reason to know about every single student infraction enforcement, but rather that the field of higher education and purposeful programming REQUIRES us to know as much about broad groups.

My response then to privacy concerns is less to limit what is already being put out by students explicitly or implicitly, but to limit who has access to the specific data that allows us to utilize aggregated trends. This is the principle behind academic records collection in the first place and is a fundamental tenant of the EBI-MapWorks program.

Conclusion

If we want to get the best picture of what institutions are doing for their students, we cannot limit ourself to artificial indicators based on the subjective desires of individuals. We must look at the whole student and whole groups of students. Currently, we have devised a plethora of tools that look at individual components of the student experience, but the time has come to step up our game.

The great part is that the information is already out there to bridge the gap between the quantitative assessments and the qualitative experiences. We just need to devise a mechanism and a philosophy that encourages us to admit that we don't know everything. We also need to admit that theory to practice is messy. Theories can be sound, but limiting or expansive by but weak in their descriptive and predictive powers. Similarly, a good theory can be messed up by poor implementation while a bad theory can be adapted to be useful by a great practitioner.

Knowledge is rarely constructed in a vacuum and student affairs is certainly not practiced in one. Whether we are looking at risk factors of a student and trying to determine a proper intervention or investigating a student organization for hazing, more information (if properly queried) can lead to better outcomes. When we aggregate the data we increase our ability to make predictive decisions based less on subjective and artifactual observations. Nor does the approach require constant surveys. Let us instead ask for little bits of information from a lot of people, all of whom are interacting with our students.

Big Data lets us bridge that gap. As a philosophy and a practice, I believe it shows a lot of promise.

The information genie is already out of the bottle. Shouldn't we make the most of it?