Sunday, July 22, 2007

Peeking and Pottering

This morning, I gave into temptation and searched for a summary of how
the Harry Potter series ends (no spoilers here). I haven't followed
the books, though I have seen the movies. I didn't feel temptation to
read the last book, but I still wanted to know what happened to the
characters. Funnily enough, having read a paragraph-length summary of
the ending, I now feel tempted to get the book and watch how the
characters' unfold to get there.

This is pretty reflective of my overall work style, come to think of
it. I enjoy figuring things out, but mostly when I have a good mental
picture of where that process will get me. When I first starting
learning to use computers, I hated being told "just play with it". I
couldn't feel productive with so little structure. I hugely
appreciate the value and education of exploration in an undefined
space (I loathe asking for directions given how much of a map I figure
out from being a little lost). I need some semblemce of a target
(recognizable, if underspecified) to organize that exploration though.
Programming and engineering work this way, so my career is a good fit.

But back to the end of Harry Potter. I found a Slate article that
associated "read the end [of a book, in this case HP] first" with a
desire for instant gratification, then reported on a study linking
personality traits to instant gratification. The Slate author admits
the parallel between the study and end-of-HP-readers is a little off,
but for me at least, the parallel doesn't work at all. I don't look
at the end for gratification, but for organization. By knowing where
the story is going, I can watch for the hints and signs along the way
and follow how the structure of the story unfolds (without having to
read the book twice). I explore the characters differently once I
know where they will end up, and I enjoy the studying personality
cause-and-effect, as reading this way provides.

That raises a question, then, of whether I read the ends of all books
first. Rarely. I only do this when I know the characters before I
start reading. Reading for me is a search process: looking for the
structure and choices that get from point A to point B. If I know
point A (through advance knowledge of characters), I want to know B so
I can search the characters personalities from both ends. If I know
nothing at point A, I'm happy to let the writing construct the space,
as long as I expect that construction to be interesting. Knowing the
end can convince me that the construction might be worth reading. I
feel no remorse over having e-peeked at how HP ends, especially given
that my interest in the story overall is piqued as a result.

Tuesday, July 17, 2007

Outsourcing begins at home

CRA, the main advocacy group for Computer Science research, recently cited a report on how companies decide where to locate R&D efforts. This is a hot topic in CS circles as perceptions of international outsourcing are at least partly responsible for the dramatic drop in student enrollments in recent years. [Note to students and parents: the perception is overhyped: more IT-related jobs are available now in the US than at the height of the boom according to a taskforce on the subject]. According to the summary, costs are not the main factor behind where to locate R&D efforts. Other economic factors, such as local growth potential and university infrastructure, are also key. If you prefer Richard Florida's view of a "spiky" world to Thomas Friedman's "flat" one, this summary will not surprise you.

This weekend's NYTimes ran an article about public schools' efforts to achieve racial diversity through socio-economic diversity (subscr reqd). While the article mostly discusses how districts have struggled to make this work in practice, it also cites successes in Raleigh, North Carolina, where performance on state reading tests has improved dramatically within traditionally poor-performing racial groups following socio-economic school assignments. The results are attributed to getting more disadvantaged students into schools with stronger cultures of achievement.

It's a fairly similar goal to outsourcing in many respects: establish a stronger innovation culture in countries with developing economies. This strengthens the local economy while creating new markets (and product ideas) for the company doing the outsourcing. The opening of R&D labs in China, India, and other Asian countries by western firms follows this vision. Yet doing this on the international scale induces much hand-wringing in contrast to the praise it induces done within our own borders. Success on both fronts could creates competition for jobs at home, but we fear the international competition more, partly because revenues leave our borders. Perhaps deep down we also believe international groups will make progress faster than our own disadvantaged groups. The CRA-cited report describes different metrics used to gauge outsourcing to developed versus developing economies. Where do our own disadvantaged communities fall in that spectrum?

Of course, investing at home or abroad isn't an either-or decision. The juxtaposition of these articles just reminded me that we talk a fair bit about international economic development, without nearly as much focus on how to foster domestic talent. It seems to say a lot for our problems here at home.

Wednesday, July 4, 2007

Fragments of Health Policy

Tuesday's New York Times ran an article (subscription req'd) on people who are denied access to health-related info (for themselves or family members) due to medical personnel not knowing how to interpret HIPAA regulations. HIPAA is one of the key pieces of US legislation governing privacy of medical information. Medical personnel apparently deny access in many cases that HIPAA would allow because the boundaries of the law are either unclear or unknown to the personnel.

Given the liability involved, denying access is an obvious default. Denying access beyond the intent of the policy becomes problematic, however, when it prevents people from getting information they need in order to get something legitimate done. The balance between protection and availability is hard to get right. When the people charged with enforcing the policy are unclear on its scope, the balance skews towards protection and more legitimate tasks are prevented. HIPAA permits but does not require a wide range of requests for information, so many err on the side of caution and choose not to disclose (sometimes for other reasons but using HIPAA as an excuse). Disclosure policies are left to individual health providers.

In addition to the original article, the Times also published an interview with the deputy director for information privacy for the US Dept of Health and Human Services. The interviewer asked why disclosure is left to providers, rather than included as part of the original policy. The answer was that the department was charged with developing privacy policy, not disclosure policy. The intent was for providers to develop disclosure policies, as long as they didn't violate HIPAA regulations.

The real problem here then, is not one of people not understanding HIPAA, but one of people not understanding that HIPAA is a _component_ of health information policy, but not the entire policy. Language standards for writing access-control policies, such as XACML, have evolved to support policies that handle some issues but not others. If one were to encode HIPAA in XACML, most of the disclosure cases would produce a decision called "not applicable", instead of one called "deny". This would enable the hospital to write its own policy for the disclosure cases. The hospital's overall policy would be a combination (technically, composition) of the HIPAA and local policies. Either sub-policy could issue a decision on a request for information, using the "not applicable" answer to say "this is outside my scope, let the other policy decide".

In short, HIPAA is a fantastic example of why policies need to have three possible decisions (permit, deny, not applicable) rather than just two (permit, deny). Practioners are defaulting what should be not applicables to denies and skewing the intent of the policies. Fixing this requires making people aware that HIPAA governs only a fragment of information access issues and having them learn the local policies as well as the federal one. With only a couple of policies, this isn't hard, but it is subtle: one policy is rarely enough. The trap is that, since it looks like a complete policy, is gets overapplied. The policy language has to be subtle enough to distinguish "required" from "permitted" from "disallowed". The language is there, but its subtlety, and hence information access, is seemingly lost on many people.

Tuesday, July 3, 2007

The Physical Abstraction Layer

Yesterday morning I was stuck on a research problem, so I went out for a run. Athletes often remark about how participating in sport taught them important lessons about time management and perseverance in other aspects of their lives. After pushing myself through a temptation to reduce the run to a walk, I ditched thinking about the research problem and turned to the question of athletic perseverance: why is it easier to push myself to keep running than to keep pushing myself through a research problem?

I believe it comes down to a simple abstraction. Running offers three basic levels of movement: walking, shuffling/light jogging, and running. Once I'm walking, I've stopped doing what I'm really trying to do. Once I'm shuffling, I've stopped respecting the time I allocated to getting some exercise. In short, there's an obvious metric for whether or not I'm (a) running and (b) doing a decent job at it.

Research, in contrast, offers more of a continuum of effort. There are some very concrete states (e.g. writing a paper, preparing a talk), but the creative portions of research are much more open ended. I can look like I'm working without really having my attention on my work. There's also a catch-22 to monitoring my attention to research: once I'm checking that I'm focused on a problem, I'm no longer focused solely on the problem! It's fairly easy to fake research effort and convince myself that I am really working when I'm not really there. Running doesn't let me get away with that.

So what does running teach me about perseverance in research? It doesn't teach me _how_ to persevere, or even how to recognize that I'm off-track. It does remind me that one can push through temptations to stop. It's attitude-conditioning. This isn't always a good thing though: I've been guilty of spending many a truly unproductive hour sitting at my desk trying to force work that my mind simply wasn't up to at that point. Research has taught me that giving up and walking away is often extremely valuable, especially since my mind keeps working subconsciously (which is why running helps with research in the first place).

Which brings me back to my original question: what does sport teach us about perseverance in creative fields? I concluded that it actually doesn't teach me much. Running works for me because my creative brain works better when "distracted" by a simple run. I'm glad that running has such clean states, because I want a simple metric for "good enough" when I'm trying to restore rhythm to my day. It doesn't scale to the sorts of perseverance needed in the continuum of creative work. There, it's far too easy to fool yourself.

Sunday, July 1, 2007

Breaking down the thought process of computer science

Last month, Language Log had a post on how people learn to think like their professions. The article was written by a linguist, reporting on recent books by a doctor and a lawyer. According to the article, doctors and lawyers learn to think like their professions by comparing information obtained from patients/clients against the treatments and case law that they studied in school (I haven't read either book yet). From a computer science perspective, the description resembles searching a mental database for facts that match the situation at hand. The significant challenge here is presumably in figuring out which queries to pose against that mental database, as the queries reflect which components of a situation are likely to be relevant, which need to be explored in tandem, which need to be generalized, etc. Having a good mental database is obviously also important, but the query construction process seems more reflective of how professionals think.

How does this compare to thinking like a computer scientist? As with most disciplines, we draw on experience and compare situations to figure out how to solve problems. Query formation remains a significant component of how we think as professionals: a computer scientist needs to know what questions to ask about performance, security, reliability, usability, and a host of other system-related -ilities. But our mental database construction problem seems more substantial as well because of the volatile, unregulated, and still mysterious nature of computational systems.

Both law and medicine build heavily on precedence and legal bounds on practice; this shapes the space in which they search for problem solutions. Computing lacks the legal regulation of medicine and law (recall Parnas' oft-cited call to replace disclaimers with warranties in software). Many doctors and lawyers deal mostly with cases that fit existing precedents (the challenge becomes which ones to apply, but the diseases or situations themselves don't change as fast as computing technologies). Law seems to deal with fewer interacting agents than medical or computing problems; medicine seems to have a richer set of diagnostics for exploring how treatments behaves than we often have for computing systems. Living organisms also seem more fault-tolerant, on the whole, than computing systems, which are still very brittle. On the flip side, computing systems lack the complexity of the human brain or body, but I suspect more average computing professionals have to confront our limits of complexity on a daily basis than do the average doctors (who can refer patients to specialists for complex cases).

When we train students to be computer scientists, we really need to train them in the science of how discrete (as opposed to continuous) systems break. They need to think about how someone might attack the system, circumvent it, or use it for harm. They need to think about how to keep the system maintainable in light of new features and new technologies. Our mental databases really need as many facts about which decisions lead to what problems as well as which lead to what solutions. This is somewhat true of medicine, but I again suspect that average programmers deal with this more often than average doctors (beyond drug interactions, which are fairly well documented).

Not many computing programs really take the study of breakage seriously. We spend a lot of time focusing on systems that do work, that are correct, that perform well, etc. These are all necessary and valuable. But when your goal is to make something break, you ask yourself different questions than when your goal is to make it work: there's a continuum from "working" to "broken" and the missing questions lie in the middle. How many students really learn to stress-test their code, to inject faults and study code behavior, to put their work in the hands of novice users, to attack others' systems so they can think about how someone might attack their own? We have the luxury of working in a science of the artificial in which we can try to break something without compromising an organism's health. How could we best exploit that opportunity within the time constraints of a university education?