Saturday, November 07, 2009

I give up on CHI/UIST

The CHI reviews just came out and I have to say I'm pretty unhappy... not with the numbers per se... (one paper I co-authored has a 4.5 average out of 5 and I'm sure I'll get a fair number of papers accepted), but instead with the attitude in the reviews. The reviewers simply do not value the difficulty of building real systems and how hard controlled studies are to run on real systems for real tasks. This is in contrast with how easy it is to build new interaction techniques and then to run tight, controlled studies on these new techniques with small, artificial tasks (don't tell me this is not true as I have done it and published good papers of this style also).

I really am ready to give up on CHI / UIST and go elsewhere (to another existing community or create a new one -- UISTSys anyone?).
I've talked about this for 3-5 years with many of you, but I think I've finally had it as there has really been no change. In fact, I think it has gotten worse. The highest ranked paper we wrote took 6-10 weeks of work and is well written, interesting to read, and synthesizes many studies in multiple communities. It is valuable to the CHI community, but it invents nothing new. I'd love to see it published at CHI and I think there should be room for multiple kinds of work at CHI (including nice surveys, opinion pieces, interaction techniques, fieldwork, and systems work).

The papers we have submitted with truly new ideas and techniques, and years of work behind them, get reviews asking you to do 2-4 years more work. For example, they ask you to create a completely different system by another team with no knowledge of your ideas and run an A vs. B test (because that commercial system you compared to had different goals in mind). Oh, and 8-10 participants doing 3-4 hour sessions/participant isn't enough for an evaluation. You need lots more... They go on and on like this. Essentially setting you up for a level of rigor that is almost impossible to meet in the career of a graduate student.

This attitude is a joke and it offers researchers no incentive to do systems work. Why should they? Why should we put 3-4 person years into every CHI publication? Instead we can do 8 weeks of work on an idea piece or create a new interaction technique and test it tightly in 8-12 weeks and get a full CHI paper. I know it is not about counting publications, but until hiring and tenure policies change, this is essentially what happens in the real world. The HCI systems student with 3 papers over their career won't even get an interview. Nor will any systems papers win best paper awards (yes, it happens occasionally but I know for a fact that they are usually the ones written by big teams doing 3-4 person-years of work).

Don't tell me that as much systems work is appearing now as in the past. It is not true and much of the systems papers that do get in require big teams (yes, 3-4 person-years for each paper). When will this community wake up and understand that they are going to run out any work on creating new systems (rather than small pieces of systems) and cede that important endeavor to industry?

One might think that the recent papers on this topic by Dan Olsen at UIST and Saul Greenberg/Bill Buxton at CHI would have changed things, but I do not believe the community is listening. What is interesting is that it is probably the HCI systems researchers themselves who are at fault. We are our own worst enemies. I think we have been blinded by the perception that "true scientific" research is only found in controlled experiments and nice statistics.

What is the answer? I believe we need a new conference that values HCI systems work. I also have come to agree with Jonathan Grudin that conference acceptance rates need to be much higher so that interesting, innovative work is not left out (e.g., I'd advocate 30-35%), while coupling this conference with a coordinated, prestigious journal that has a fast publication cycle (e.g., electronic publication less than 6 months from when the conference publication first appears). This would allow the best of both worlds: systems publications to be seen by the larger community, with the time (9-12 months) to do additional work and make the research more rigorous.

Addendum:
This post started as a status update on facebook, but I quickly went over the maximum size for a status update (which I had never run into before). Thus, this blog post. Note that this was written hastily and late at night. Don't take this as a scholarly attempt to solve this problem (i.e., I cite no statistics to back up my claims here!) Also, this is not an attempt to influence the PC on my papers under review. I couldn't really care less about any individual paper. It is the trend over time that has me upset. I've done quite well at publishing at CHI so it is not about sour grapes. It is more frustration at how hard it is to publish the papers that I believe are the most important. If it is happening to me it is happening to many other people.

85 comments:

John Mount said...

Good post James, hope you get more positive feedback than undeserved heat.

jofish said...

So I note that I feel very similar about my CHI reviews (which came back significantly less well than yours, I'm somewhat frustrated to say), but my personal version of this rant search-and-replaces "HCI systems" with "ethnographic studies", and I think that says something important.

Partly this is the whole "everyone at CHI is a minority" problem. Let's say there's fifteen percent systems people, and they feel shat on by everyone else because everyone else doesn't understand the difficulties and above all the way that useful knowledge is generated in systems. Then there's the qualitative/ethnographic people, and let's say there's fifteen percent of them, and they feel shat on by everyone else because etc etc. And then there's fifteen percent of people doing stunningly small incremental additions to Fitt's Law, and fifteen percent people who want to count the entirety user experience on a scale of one to five in double-blind experiments (you know, LIKE SCIENCE) and fifteen percent artists and fifteen percent designers and fifteen percent games people and so on and so on. It means that everyone's contributions are always being judged by people who don't necessarily understand the nature of the contribution, and that's a problem.

So one way is to go off and only participate in smaller conferences (and I can't really speak to UIST, as I've only been once) where there's more of a shared epistemological approach. But that, too, can get scuppered. I was at Creativity & Cognition the other day, and I felt like a third of the people felt creativity was really hard to do and rare and were studying that, a third felt cognition was really hard and were studying that, and a third were really just publishing there as an alternative conference with a reasonable acceptance rate that got them a line on their resume.

So I do wonder if going off and starting a new conference is the right way to go. It might help in the short term, but it's not clear that it really solves the problem. And there's issues of hiring: quantity of papers and acceptance rates at the conferences they're published at is one proxy measure for quality of researcher, and it's not ideal. Our field is at least relatively reticent (or just very quiet?) about using formal measures to evaluate job candidates like H-indexes or citation counts, but there will invariably be ways that young researchers, who have something to prove, will find ways to prove it.

Finally, on that topic: the journal you describe I think really might be a valuable contribution to the field, and let's face it, you're personally in a very good place to make that happen. I dare you.

Jofish

Bill Buxton said...

It worries me when some of the people who have the most experience and the most to contrbute express such views. The good or bad news, depending on your perspective, is that you are not alone.

I share your frustrations - to the point that about 8-10 years ago I gave an invited talk at SIGCHI which was about why CHI was becoming irrelevant, and why I was never coming back. I held to that vow for 4-5 years, but eventually returned. My view was that while CHI was not what I wanted it to be, it had a useful and legitimate function - albeit one that did not (IMHO) fulfill its true potential. For what was missing, I just went elsewhere.

What worries me is that while you and I are far enough along in our career that we have choices, it is much harder for those who constitute the future of the field. The shortcomings of CHI risk biasing their work along a potentially non-optimal path - due to the pereived importance to have CHI publications in one's c.v.

The reality is that while I fully respect the values and methodologies of experimental research, I also equally understand that while an essential part of our discipline, it is not sufficient.

All the rigour in the world applied to atomic-level tasks does not shed suffient light on how to put those low level tasks together into a coherent system. Nor can more complex systems be studied in a holistic way with the same formal cotrol as low level tasks. Yet, the impact of how these low-level tasks are put together generally has far more impact on the oveall user experience than how optimally those atomic tasks are implemented themselves. Somehow, we need to find ways to bring meaningful commentary on these holistic considerations to the fore - if we want to achieve the impact on real systems that I would hope we aspire for.

When I said that I was leaving CHI, my argument was that the GUI that was the standard at that time, emerged without any of the CHI literature. Both SIGCHI and the GUI "launched" at about the same time. My point was that the accumulated research of over 20 years of publications in the CHI literature had arguably not had anything near the impact on real users and systems that the GUI that happened with none of that literature.

That is not to say that the literature had no value. Rather, I just wanted to suggest that with a different approach, it might have had greater impact - without throwing the baby out with the proverbial bath-water.

CHI is an important part of the eco-system of our field. I would far rather have it reflect the same level of evolution and innovation in its structure as it aspires to bring to the practice that is its domain.

Some of us have tried to bring about change from within. To CHI's credit, if gave efforts like the paper that Saul and I submitted a fair hearing. So, things are not hopeless. But they remain frustrating, and I hope that the impact of your comments is not that you leave, but that you provide the catalyst for an invogorated discussion of the topic - along with some needed change, while preserving that which is strong.

Thanks for sharing your thoughts. I hope this note is respectful to both your intent, as well as the CHI community as a whole.

Larry said...

James - Wasn't UIST created in response to the same problem?
/Larry Rowe

James A. Landay said...

Larry, the founding of UIST is before my time but I believe what you cite was one of the reasons for the start of UIST. Over the last five years the UIST PC has looked more and more like the CHI PC. Although it is still more systems oriented than CHI, the program has come to be dominated year after year by more and more interaction techniques papers and thus the expectation that most papers at UIST will have similarly tight controlled experiments [note: note all, but it is much harder to publish without it]. I have a hard time finding much difference between UIST and CHI these days in the style of papers from the CS sub-community of CHI (which is who goes to UIST).

James A. Landay said...
This comment has been removed by the author.
James A. Landay said...

Thanks John. I'm sure I'll get lots of constructive feedback to this post.

James A. Landay said...

Jofish, I used to think the main problem was the diversity of CHI and thus the different values in the reviews. I really don't think that is it anymore for two reasons. First, the sub-committee structure removes much of that (btw, I was very happy with how the subcommitee I served on last year worked and I'm sure the others also worked well). The second reason is that I see this same problem at UIST, which is again a much narrower community. It is in my opinion a fundamental problem with what many people think of as "research" in this community. This includes many "systems" HCI folks. I think the senior members of the community (I am now one of them) have done a bad job of educating the community and our own students.

Jeff said...

I have to say that my recent experience with CHI/UIST is a bit different than yours. Take for example this past UIST. I submitted two systems papers to that conference and one was accepted. The one that was rejected was the better implemented system with a deeper evaluation, and the one that was accepted was the more novel but also much more preliminary system. Although I'm still shocked at that particular result, I think it suggests that there is some support for accepting novel systems work without requiring a tremendously deep evaluation.

My experience with UIST 2008 and CHI 2009 papers was a little different, but also gave me hope for systems papers. All three of the systems papers that were eventually accepted to those two conferences came back with borderline reviews and ridiculous reviewer requests. I remember specifically for the PIE paper that a reviewer and an AC requested that we do a year-long deployment of the PIE system with people outside of IBM. For me, that suggests that the program committee is balancing the difficulties of writing/evaluating systems papers when final decisions are made.

I think that this also suggests that the wrong time to react is in the middle of the review process. There's still a ways to go and I think some of these issues are mediated by the program committee.

If you do decide to go to a different community, it's worth considering whether the process in a new community is more or less conducive to systems papers. For example, we have meta-reviews and rebuttals, which other communities don't (for example, WWW). I think both of those features make it easier to publish systems papers than it would be otherwise. The value of meta-reviews is obvious, I think, but becomes even more apparent when you try to publish in other communities (like WWW) that don't have them. Rebuttals are also key for systems papers, because they allow you to address the inevitable weaknesses that crop up in most systems papers. For many of my papers, I think the rebuttal has been key to getting the paper accepted.

Finally, I also think we need to spend more time educating people on how to write good systems papers. I read far too many that frame the contribution as the system itself. While I'm sure the system in many of those papers has value, in my opinion this is usually a bad strategy. A much better strategy is to identify the key novelty of your system and sell that as the contribution. This makes it easier for reviewers to understand what is new, and it also allows you to focus the evaluation. Maybe this pushes the focus to a piece of the system instead of the system as a whole though, and upon re-reading your post I see that you've differentiated those two types of papers.


On a completely separate note, we're working on lowering the turn-around time for TOCHI. In a year or two, maybe we'll be at the point where we can go from submission to electronic publication in 6 months (our goal is to reach a decision in 4 months...electronic publication upon acceptance is something that should be easily possible but ACM isn't doing yet).

James A. Landay said...

Bill, great comments and I really respect your opinion and experience here. I probably will not leave CHI but still think some new forum needs to be started (as other fields have done) as I really do not think it is possible to fix this problem.

The thing I worry about most is what you right here: "The shortcomings of CHI risk biasing their work along a potentially non-optimal path - due to the perceived importance to have CHI publications in one's c.v." This is the major problem with CHI. The values for tenure and hiring need to change (at least in CS -- I can't speak to other fields).

Saul Greenberg said...

I don't give up on CHI/UIST. More than any other venue, CHI, UIST and CSCW are home. Are they perfect? No. Can they be changed over time? Yes. As senior members of the community, it is up to us to prod, cajole, alter, etc. the community. We have to remember that there is no central control, no CHI Mafia who are orchestrating things. Rather, it is the thoughts of the community as a whole (and how we ourselves referee and judge papers) that decided on what and who we are.
Of course, it takes time. I take heart in that the CHI Community and what it accepts has changed drastically since I attended my first CHI in the early 80s. I also take heart in that there are myriads of special interest conferences that have spun off of CHI, and that these affect how mainstream CHI works as well.

So my suggestion - take the lead to change things. Some of us are already doing this by opening up the discussion (Jonathan and James do this very very well). Some also serve this by taking on leading roles in the CHI Community and trying to affect change there (I tried a bit of this last year, and Scott is continuing to push it). What surprises me in the debate is that the strongest advocates for change are often the leaders who have been in the process for decades (e.g., Dan Olsen); its often the younger newbies that we have to convince, as its easier for them to follow the so-called CHI formula for success.

I have also been thinking that I may create a new keynote talk for those times I am asked to give such talks: The importance of Computer Science/Systems to HCI. As senior folks, we can direct the message to our audience.

Ok, I am optimistic, but why not?

Bill Buxton said...

Just a quick historical point and comment. Yes, when a group of us had the conversations that let to UIST, the point was to provide a forum for work that did not cleanly fall into that of the SIGCHI Confernence.

It is worth noting that UIST was, nevertheless supported by CHI, and part of the larger associated eco-system.

Second, the formation of UIST was an example of how a small group of indivituals could help address a perceived problm.

Third, once these complentary forums are established, I think it is imprortant to be constantly attentive to maintaining the unique character and purpose of these other conferences, and not have them slip into a kind of convergence, where they become more like CHI, rather than reflect the rich mosaic of the field.

I think that for the most part, we have done a reasonable job of this. It is, nevertheless, important to remain vigilant, and not take things for granted.

Finally, I think it perfectly reasonable to always question if the nature of the current mosaic adequately relects the field as it exists today, rather than that when these conferences were first set up.

Rather than just question the structure of SIGCHI, perhaps we might better look at he suite of associated conferences and ask (1) doeseach still has the right focus?, and (2) what gaps have opened up that might need specific action to address - perhaps in a manner analogous to how we addressed that of UIST, CSCW, Tabletop, etc.?

In short:

1. Let's look at the eco-system, not just SIGCHI.
2. By having the right eco-system, where each outlet is at a high standard, a worthy forum can be provided for those in the formative stages of their career, and one which avoid some of the existing biases.

Just some thoughts triggered by the comments so far.

Jonathan Grudin said...

Complaints about the conferences are a hardy perennial, but largely missing are thoughtful analyses of why things are the way they are. The implication is that if people just woke up, they could fix things, but my sense is that there are larger forces at work. For example, every year we have dozens of summer internships uniformly organized as 10-12 week projects aimed at a conference submission. This helps shape the conferences and trains students to adopt a "next conference" style of research planning.

Larger forces are set in motion by our adoption of highly selective conferences. Increasing acceptances to 35% would make CHI more interesting (and undercut some of the smaller conferences, but it probably would not lead to original important ideas being accepted, because original work is inevitably rough around the edges. What is needed is a shift in reviewer mindset from "what should be fixed in this work?" to "what would be interesting to discuss in this work?" CHI is very insecure, and feels that if we accept rough but interesting and important work, we will be seen as not rigorous and lose the respect of Computer Science colleagues. We are apparently not strong enough to tell them that in a dynamically changing world, we can't afford to stick to what is iterative and safe.

Scott Carter said...

My N is exponentially smaller than yours, James, but I and my colleagues had the same frustration even as graduate students. It seemed clear early on that system papers required not only development, debugging, design iteration, etc. but also extensive studies, while in papers we submitted that were only study-based the study itself was often held to a lower standard than those required in a systems paper. That makes the system papers something 5 times as time consuming. We could see that meant to have success we would need to write a certain kind of paper, but most of us didn't want to do that kind of work.

I think this is self-evident enough that the "future of the field" has been already largely ceded to industry (read startups). In particular with the lowering of the bar for web- and mobile-based development, I think the startup community may now be a more vibrant arena for novel technologies than the academy.

One way forward is for system designers to embrace the aesthetic of makers, rather than try to shoehorn their work into scientific models. Perhaps another conference would help, but it seems as though, as you suggest, the methodology itself needs to change. This is 2009, afterall. Having a rigid review process that ultimately hides work behind paywalls seems antiquated. Why not release papers (like systems) early and often on something like arXiv, rapidly incorporate feedback, allow voting, and create a journal to highlight
papers that emerge as polished and well-respected. One could imagine taking some ideas from the myExperiment project and linking code to the drafts. This might also mitigate some concerns Buxton brought up in his 2008 paper regarding reproducibility.

Bo said...

Bill: "For what was missing, I just went elsewhere." - What are those other venues? Might like to give them a try.

Jofish: Enjoyed reading your discussion. I think it is that kind of specialization of interest that damages CHI. Personally, I'm not an ethnographer, or a psychologist, my graphic design skills are abysmal (though I like to think my interaction designs are adequate) and although I have a CS degree, many of my tenured non-HCI colleagues would not see my work as CS research (little math, little simulation, little performance metrics, a few algorithms, etc.). Sometimes, insecurity makes me wonder if I was drawn to CHI as an avoidance of the tougher standards I'd need to meet to publish in anthropology, psychology, design and other CS conferences/journals.

But I remember well what drew me to the CHI/UIST community in the beginning. It was not that these conferences welcomed multiple disciplines, but that each researcher was her/himself multi-disciplinary, multi-talented and combining the insights as appropriate for the problem at hand. A place where the same person would know the major contributions of both Goffman and Knuth. A kind of renaissance approach toward invention in the mold of da Vinci.

As you characterize, my sense is that most of the community seems to see themself as grounded in one discipline or another and that their interests are marginalized and that the other disciplines don't "get" their domains' style of contribution. To me, that's the heart of the problem - we shouldn't be a collection of disparate disciplines but the intersection of people who cross multiple categories in the research they do. Such people do still attend CHI, but they aren't the ones with the most papers.

Florian Michahelles said...

Hi there,
I'm really impressed by your post, James. I'm even more impressed that most of the comments tend to agree as well. Additionally, I just had the same feelings during the last days. At MobileHCI09 in Bonn when half of the visitors hidded away in order to finish their CHI submission, there was a running gag "Let's just submit in order to make use of the comments for a revised version for next MobileHCI...".

Let me use the opportunity to place an add for innovative CHI-rejects:
Tokyo — Internet of Things (IoT 2010), a three-day public conference, gathers leading researchers from academia and industry to elaborate on the major themes of the emerging
Internet of Things.


http://www.iot2010.org/cfp/

Be sure to mark your calendars if you want to know more about:


* Green by Internet of Things / Green of Internet of Things Technology
* Design of future sustainable technologies linking the physical and virtual world
* Novel services and applications to facilitate environmental responsibility
* Emerging Internet of Things business models and process changes
* Communication systems and network architectures for the IoT
* Experience reports from the introduction and operation of networked things
* Emerging applications and interaction paradigms for everyday citizens
* Social impacts and consequences


Important dates
Paper submission due: March 15, 2010

Workshop proposal due: March 20, 2010

Scientific contributions are expected to be published in the Springer LNCS series.
We are particularly interested in work addressing real-world implementation and deployment issues.

Join http://www.facebook.com/business/dashboard/?ref=sb#/pages/Tokyo-Japan/Internet-of-Things-2010-Conference-IoT2010/162847214406

marc said...

As the program chair for HFES, can I suggest giving us a try? We are meeting next fall in San Francisco, so we are specifically making a push to attract the more dynamic and exciting work that you are all taking about. The current Internet and Computer Systems programs at HFES are really small, in part because many people in recent years had similar complaints about us (people really do leave conferences when they get disappointed).

But as program chair, I can attest to the fact that a strategic shift is in the works. But we need the submissions to make it happen. Perhaps this is an ideal partnership. You send in the proposals and I will see that they get reviewed by people with the appropriate criteria.

GeneG said...

I certainly share your experience with respect to CHI. My sense is that we are trapped in a local maximum, trying to optimize too much without encouraging more diversity to help us break out of that maximum.

One possible way to change this is to recognize reviewing as a first-class activity.

Jeff said...

Good post James. Having lived in Silicon Valley for several years now, I have to say that I agree with Scott: industry is driving the future of computing more than the academy right now. When people think about who is driving the user experience forward, they think of companies like Apple, Google, Facebook, and Twitter. I further note that companies that do the most publishing in academic forums (Microsoft Research, IBM Research, etc.) are regarded as the least innovative in the marketplace. At UIST this year Microsoft Research published a paper examining different approaches to building multi-touch mice. Apple built and shipped one.

I personally find myself increasingly giving up on CHI and even UIST because I find them irrelevant to the future. I still enjoy talking with my fellow attendees, but I see too many papers published just to publish, studying systems industry has already deployed, or looking at such a narrow slice of a problem that the results are meaningless in real life. I encounter more innovation scanning Techmeme these days than I do at the average conference.

I haven't completely given up on academic forums like CHI and UIST yet, but if / when I do I suspect it'll be to jump even further toward the industry side. When the technologies we've put in place make it easy to share our prose and code with the world, why depend on the opinions of just 4-5 people to determine what is worthwhile?

Mary Czerwinski said...

I agree with Saul, it's up to us to change things. That's why I agreed to be PC for UIST this year, even though I've found the conference to be largely irrelevant that last 5 or so years now, with incremental interaction techniques, just as James described. I brought this up at the conference meeting and I think people were aghast that I considered trying to shake things up. I don't want to change the conference per se, but I do want to invite more innovation and see bigger problems tackled. Anyone who is posting here, please contact me if you'd like to be on the committee. Thanks.

Anonymous said...

As a second year master's student I sit on the edge of the industry versus Ph.D. decision. I must admit that one of my largest considerations is that the work I would want to pursue doesn't seem to fit well into the academic mold of CHI/UIST (I didn't even submit to CHI 2010 so this isn't a sour grapes note). I first note that my novice status means that my opinions are certainly not grounded in experience, but are the first impressions of a novice member of the community.

While systems cannot be built for the sake of building systems, it seems that the emphasis is largely on the evaluation. Where is the emphasis on design and creativity? Where is the emphasis on collaboration?

In a culture where you are expected to publish at least once per year, how do you go about systems work? How do you collaborate with others on systems work (particularly other students) when only one person can be first author on the publication of said system? It is great to encourage interdisciplinary work, however it seems everyone doing interdisciplinary work has to pretend it is all about the science.

Frequently I read survey/interview papers where the methodology is rock solid; however the majority of the design implications seem self-evident. I frequently question if a designer was given a few hours to brainstorm would they come up with over half of those design implications? Furthermore, if the authors were given a few hours to brainstorm prior to running the study would they come up with over half of those design implications? I once heard a great HCI researcher propose something like the following. For every study that generates design implications, have the authors sit in a room and brainstorm for a few hours and write down all their design implications. They then place this in a closed envelope and revisit the list after their study.

Where do you draw the line between a designer and an HCI researcher? Do the means justify the ends in HCI research? If a designer and an experimental study can yield the same insight, why is the emphasis on ensuring that the result was obtained through the most rigorous means possible?

The ideas I want to pursue require the complexity, knowledge and experience of a Ph.D., however I am not convinced that I can exist within the current academic system.

Sadat said...

As a first-time AC for CHI 2010, I experienced a wider view of the reviewing process than I normally would. I think the sub-committee approach is a step in the right direction, but still doesn't address the methodological biases that many in the CHI community still hold. Some want a controlled experiment and do not appreciate field studies, some want a theoretical framing while others think theory does not add much.... the list goes on. These biases exist even within sub-committees. At the end of the day, acceptance at CHI seems to hinge on a roll of a dice in getting 3-4 reviewers that agree with your methodological stance in assessing your contribution. We claim to be an inter-disciplinary field, yet I've seen an unhealthy under-appreciation of methodological orientation that differs from a reviewer's own. To move forward as a community, we need to acknowledge that there is no *one* good way to build/study.

Ben Bederson said...

My feeling about publication has always been that the focus on quantity (as James said) is fundamentally wrong. After all, you are what you measure.

The only solution I can think of is to only allow, say, 2 publication per year per person. Since you can't control supply (i.e., a global watchdog would be ridiculous), control demand.

Simply only allow 2 papers per year (of the author's choosing) in any hiring or promotion case. The world would dramatically shift. All papers would almost immediately become thoughtful and carefully crafted. There would be no random fluff, and only the occasional nut job. People would put their very best work forward.

----

Ok, this is completely impossible for any number of reasons, but one can dream, can't one?

James A. Landay said...

Jeff, glad to hear about changes at TOCHI. I'm happy to see Shumin and others (like yourself) are going for that type of speed. That will help no matter what happens with respect to CHI/UIST, but I still think we need more radical change.

Regarding going to a new community. I'm more likely to start a new one rather than go to an existing one.

I agree that rebuttals are important (in fact I started rebuttals in the CHI community when I was the program chair of UIST and it caught on -- note I copied it after hearing about it from SIGGRAPH friends).

James A. Landay said...

Saul, I'd love to see that talk on the importance of systems research to HCI! PS Shouldn't you be on the beach?

James A. Landay said...

Bill, again great comments... you guys keep talking me off my ledge.

James A. Landay said...

Jonathan's comment: "What is needed is a shift in reviewer mindset from 'what should be fixed in this work?' to 'what would be interesting to discuss in this work?'"

This is exactly how I feel. I'm not sure it is about feeling insecure with CS colleagues as much as it is feeling insecure about ourselves as "scientists." I'm in an engineering field as far I see it. I have no qualms.

James A. Landay said...
This comment has been removed by the author.
James A. Landay said...

Jeff says: "I encounter more innovation scanning Techmeme these days than I do at the average conference."

Again, this is what I worry about. There is very little incentive to innovate since the bar is fairly high.

Folks, this happened in other communities before HCI (my friends in networking, systems, and architecture have complained about it in the past). What did they do? Started new conferences.

James A. Landay said...
This comment has been removed by the author.
James A. Landay said...

Anonymous: Come work with me! I'm glad to see a student thinking this way. "In a culture where you are expected to publish at least once per year, how do you go about systems work? How do you collaborate with others on systems work (particularly other students) when only one person can be first author on the publication of said system?"

I think it is hard to publish systems work at that rate given the current bias towards systems work needed a motivating study, a complete system, a tight experiment, and a real deployment all in one little paper! I'm less worried about the 1st author issue. That can easily be handled.

James A. Landay said...

Ben,

It is funny that you make this comment about restricting folks to two papers. I actually had this exact conversation about incremental research with Jeanette Wing (head of CISE at NSF and former CMU CS department chair) a month ago and she suggested something like restricting folks to listing 5 papers for tenure and maybe 2 for hiring a new PhD. You can publish all you want, but we are only going to consider this small set in these decisions. This would have huge impact overnight and it is doable! It would just require the top 4-5 schools (at least in CS) to do it. The rest would quickly follow. She seemed interested in writing a CACM piece on the idea.

James A. Landay said...

Scott writes: "That makes the system papers something 5 times as time consuming. We could see that meant to have success."

Scott has his PhD but he noticed this trend as a student. This is what worries me the most. When graduate students decide that the odds are so stacked against systems work that they decide to do a different style of work than what they truly believe in. I've seen it firsthand at UW. I also have a lot of emails from folks telling me the different ways their papers have been treated when they submit, for example, systems work vs. interaction techniques (btw, this is my standard example but I do not intend to pick on this type of paper -- it is a convenient type to contrast since it also involves building something).

James A. Landay said...

I'm glad Mary is taking on UIST, though I heard from others that the reaction to her ideas for change wasn't great. That is why I worry that the UIST community may not be salvageable at this point. The UIST community is now composed of a different set of people and I'm not sure they agree with the point of view I've been pushing here.

David Karger said...

I think the way to address the acceptance process is to think about why we have conferences. I attend conferences to be inspired with new ideas or problems, to inspire others with my own, and to learn new techniques or tools that might help make progress on the problems I care about. I want to see papers I'll end up debating with other conference attendees. I'm not there to see good, "worthy but dull" work spent proving something everyone suspected all along, or doing something everyone agrees was an obvious good idea. This kind of work should appear in journals.

The review process' emphasis on making sure a paper has no weaknesses misses this point and forces defensive/conservative research. If, instead of saying "we want to accept the best work" we said "we want to accept the work that makes attendees think" I suspect it would make quite a change. For example, at the ISWC conference I recently chaired, our rule for acceptance was "after discussion, if any AC thinks this paper should be accepted, then it is". This encouraged variance rather safe averages, and aimed at putting debatable papers at the conference so they could be debated.

Another example worth looking at is the CIDR Conference on Innovative Database Research (http://www.cidrdb.org/). It was founded by senior figures who felt the main conferences had become too focused on a particular, "appropriate" type of database work. Their founding statement speaks so directly to the issues raised in this blog post that I include it here: "The Conference on Innovative Data Systems Research (CIDR) was started in 2002 by Michael Stonebraker, Jim Gray, and David DeWitt to provide the database community with a venue for presenting innovative data systems architectures, as well as a prestigious publication opportunity. CIDR does not compete with the established conferences presenting rigorous treatises in established areas; rather its goal is to air radically new ideas. Such papers are typically speculative or are system evaluations. The visionary papers usually lack rigorous frameworks, simulations of performance, or prototype implementations but present a radical departure from conventional approaches that enable new applications. The prototype descriptions generally are a detailed report on successes and mistakes. The other major DBMS conferences usually reject such submissions because they are not scientific. However, these are often the very papers that offer long-term value to the field, and should be widely disseminated. "

I have found this conference to be one of the best I attend. A tremendous fraction of the papers meet the objective I set out at the beginning, inspiring me to think about new problems are to think about my own problems in new ways. One characteristic of this conference, though, is that it is small---only a couple hundred researchers. I do not know whether its model can be replicated at the scale of CHI.

Andy Ko said...

James, I think what underlies your concern is that our incentive structures are mismatched with your desire to do systems with relatively long time horizons and significant upfront investment. Johnny Lee and I chatted about this issue at UIST for a good hour or two, chatting about the interaction between DIY cultures, industry, academic work, and the role of NSF funding. Johnny felt that NSF shouldn't really be funding most HCI work and I felt they should, but had yet to articulate why. After our discussion, we converged and agreed upon a few basic insights:

First, we saw three major types of systems work in the history of HCI:

(1) Systems that compete with the market (e.g., MS multitouch mice vs. Apple's mouse). If we think about the role of research in the world, this work is arguably unnecessary, since the market will be much more efficient at exploring and deploying these systems. This is the concern that Jeff mentioned, and I think he's right. If you want to play with what's possible today, the incentives and resources in academia are suboptimal. Go play in industry or industry labs.

(2) Systems that have a perpetually small market (e.g., assistive technologies). This research is important because the market won't do it. Although I tire of interaction technique papers like Mary and others, I do see a place in interaction techniques that tackle the fundamental challenges of motor and sight impairment.

(3) Systems with no market, but with the potential to create or change one. If there was anything we as researchers should be doing, this is it.

I can't give any examples of the last one because, as you lament, there's no incentive to pursue in them. NSF program officers want us to propose these systems, but panels find them too risky. CHI and UIST luminaries want to see them, but reviewers don't give ACs the scores they need to accept them. Ph.D. students want to do them, but not at the risk of forfeiting their career. Like Jonathan said, there are larger forces at work.

Starting a conference only addresses some these incentive issues. For example, you could rethink role of the PC and reviewers, like David did with ISWC, you can change the page limit, you can up the acceptance rate. You can create a conference with the right incentives for riskier systems work.

What you can't change so easily is the incentive structures of NSF and hiring committees. But I think these things are changing, albeit slowly. For example, one reason I chose a faculty position at an iSchool is to be unconstrained by the more rigid notions of quality in CS and CS-affilitated departments. I do have the problem of recruiting good hackers, but if enough faculty show up at iSchools, the students will come. And this is already happening at UW: prospectives are seeing that technical students like them find jobs at iSchools, where they're rewarded for doing the work they want to. Technical faculty at iSchools do shoulder the responsibility of setting examples, in spite of our tenure concerns.

I support the idea of a new conference; I also support the idea of changing CHI or UIST. But we won't see much of the riskier systems work unless we also make a coordinated effort as an academic community to set good examples. If this means our students can't compete for jobs at Berkeley and Stanford, so be it. Let them go to Michigan, UW, Irvine, CMU, Georgia Tech, Indiana, Wisconsin, and the myriad of other places that have made a sustained or newfound commitment to HCI.

Anonymous said...

So many exciting ideas about how to fix these problems -- I only hope we can muster the courage and support to try something new. In Andy Warhol's words: "They always say time changes things, but you actually have to change them yourself"

Anonymous said...

I've published plenty of papers at CHI and similar conferences. Of these, maybe two have had a real impact: becoming routine references, changing the way people talk in academia, mentioned frequently in the media. And here's what bugs me: these two papers *barely got in*. They just squeaked by in the ratings.

I've been lucky, but I wonder which great papers from other people I've missed because they had just one more cranky reviewer?

I completely agree with the commenter who said we need to change the reviewing mindset to judge papers on the merits, not their faults.

Jonathan Grudin said...

Great discussion. When facing change and uncertainty it is tempting to pine for the good old days, in this case to limit people to publishing a couple papers and submitting a few for appointment and promotion cases. This is how it was in the past, when rarely did anyone co-author more than two papers at a conference and when only journal articles were considered in academic deliberations. We were pushed away from that in CS in the US by several strong forces that are still in effect. The strongest is arguably the fact that conference papers are now archived and widely disseminated (and dissemination is economically significant). This seriously undercuts journals --historically the drivers of journals were wide dissemination and archiving, not quality. This set some dominos toppling and led to where we are. (Yes, there were also other factors.) Rolling this back is not impossible: Other fields operate differently. But it will be far trickier than people realize. Identifiable forces push us further away from that path. More likely we will instead find a new path forward.

John said...

A mechanism is already in place within CHI (starting with CHI 2009). We just need one or more new review tracks. Of course, breaking down various aspects (a la JoFish's comment) may lead to an unmanageable number of review tracks.

Merrie said...

I think that Jonathan’s and David’s points about changing the reviewing mindset from one of “what’s wrong with this work?” to one of “what’s interesting about this work?” is an important one, and one that is practical to begin addressing in the short term.

For example, I have often seen high review scores (e.g., a “4” or “5” on the CHI scale) effectively discounted, because the accompanying text of the review does not focus on the reasons behind the high score, but instead focuses on obscure improvements the reviewer would like to see, or on future work the reviewer thinks would be interesting but which is not covered in the paper. Although such comments are meant to benefit the author, reading this review gives the AC the impression that the reviewer in fact did not like the paper after all, and that there are significant improvements standing in the way of publication, even though this was not the reviewer’s intent.

I think that simple changes such as changing the structure of the reviewing form on PCS might help reviewers to avoid these types of pitfalls. For example, instead of having one box that says “write the review here,” we could have a more structured form that asks the reviewer to fill in answers separately to several questions, such as:

1. “What did you like best about this paper? What new things did you learn from reading this paper? What insights did you find most interesting?”

2. “Do you have suggestions for how the authors could improve this paper, such as minor clarifications?”

3. “What, if any, major concerns did you have about this work?”

4. “Are there areas of future work that you would like to suggest to the author?”

This might help some reviewers avoid the pitfalls of omitting point #1 (the positives) from their reviews, and help ACs to better separate which negative comments are seen by the reviewer as barriers to acceptance (#3) versus which are intended merely as helpful suggestions to the authors (#2 & #4).

Of course, I know that this type of structured review-writing won’t solve all the problems James brings up, but it might be a relatively easy first step to take in an attempt to improve the CHI reviewing process. Making all acceptances “conditional” on a final check by the ACs that the most important improvements were in fact incorporated into the camera-ready (in the model that UIST often uses) would also help, by making reviewers confident that clarifications suggested under item #2 would be verifiably addressed, and would therefore not need to be reasons for rejection.

Scott Counts said...

Great post and thread. Thought I chime in with an additional thought on the reviewer mindset issue (“what should be fixed” vs. “what it interesting”, etc.).

Sometimes I get the sense that we (I’m sure I’ve done this although I try not to) forget that there is no such thing as the absolute perfect piece of research, or at least such research is exceedingly rare. It’s easy to criticize, to be destructive instead of constructive. My take on why is that it is learned behavior that starts in grad school as sort of the flip side of well-intentioned and legitimately rigorous training. I was a psych student and honestly we are trained to be hyper-critical, to hunt down any little crack in the argument/analysis armor. The result is that a) the focus of peer review is on evaluating the “truth” of the work rather than other forms of contribution like how innovative it is – evaluating the evaluation more than the work itself, and b) work that is generally solid and worthwhile is rejected because a small percentage of it has issues. As others have pointed out here this can be counterproductive in the rapidly changing world of technology, especially when you are competing with market forces in industry that can iterate ideas very quickly.

BTW, I like Merrie's suggestions for actual and near-term changes to encourage constructive feedback.

Anonymous said...

Wait, so you think you should get an "A" for effort? It's the quality of the research results, not the amount of energy you expended getting there, that should count at any decent peer-reviewed conference.

James A. Landay said...

Anonymous, please don't hide behind anonymity so that you can post a snarky comment.

But, the underlying confusion inherent in your comment is legitimate: "It's the quality of the research results, not the amount of energy you expended getting there, that should count at any decent peer-reviewed conference."

I'm not saying people should get an A for the amount of effort. I'm saying that:

1) the bar that reviewers are requiring for systems-oriented work to be considered a "research result" is wrong, not just too high,

2) if you are going to compare people in the same field for jobs/tenure/best paper awards (as is clearly done), having one part of the field be required to do 5x as much work/publication is truly an issue that must be resolved.

3) systems work will always take longer to bring a project to completion. Are there contributions that can be published along the way? I believe there are, but the reviewers do not seem to see it this way.

Your comment oversimplifies a very complex issue. The issue we need to explore is about the value of the work, not just the quantity or even the quality. For example, a high quality study that offers little value should not be 10x easier to publish than a higher value, innovative system/idea whose true quality may not be known for years (e.g., after large, long-term deployments or uptake by industry, etc.)

Anonymous said...

I think my main issue with the CHI/HCI community is that it has lost the desire to solve interesting problems. One of the draws to this type of work is finding novel solutions to problems that many may not have even envisioned. I think the community has lost touch with what Licklider and Engelbart pioneered.

We should be investigating how we can use computers and technology to enhance ourselves and our connections to each other. If twitter or Facebook were research projects that attempted to publish at CHI they would have been rejected. Yet they probably have had more impact than all of the published CHI papers in the past decade. This is impact not just measured by the number of users but by how they have transformed the way we think and how we interact with each other through technology. They extend the horizon of possibilities and feed our imaginations.

That is what this community should be about, the exploration of how we can use computers and technology to transform the way we work, play, and think, both alone and with each other. This intrinsically requires a more creative approach that may not be scientific, but you can still perform scholarly research that is not science. Unfortunately the community’s overall insecurity with itself and its work will probably be its undoing and all we’ll be left with is a pile of t-tests, chi-squares, and ANOVAs that provide as much value as the paper they were printed on.

Stacy Branham said...

Anonymous says: "Unfortunately the community’s overall insecurity with itself and its work will probably be its undoing and all we’ll be left with is a pile of t-tests, chi-squares, and ANOVAs that provide as much value as the paper they were printed on."

I think this is exactly the kind of mudslinging reminiscent of the Science Wars that is inappropriate and one symptom of the underlying problem that is being discussed in this thread: intolerance of complementary research methodologies and contributions. There are a number of intelligent, well-respected researchers on both sides of the positivist-phenomenological divide. Instead of aiming to snuff out one of these factions or others (e.g. with regard to systems contribution, as James suggests) in the CHI community, we ought to be thinking about the role education can play in breaking down biases in the review process (the approach taken by Harrison, Tatar, and Sengers in their "The Three Paradigms of HCI" alt.chi paper). Varied research methodology or contribution does not connote varied validity.

Anonymous said...

I too am sitting on the cusp of deciding between transitioning from a M.S. to a PhD or going into industry. One of the biggest problems I see is that CHI/UIST publications are taken into heavy consideration in the process of evaluating your PhD work. Your credentials and the criteria for hiring come down to "counting" the number of solid publications you have. This means that a student becomes restricted to a handful of conferences (CHI/UIST) that forces them to work inside a system. When I am planning my next project, as a graduate student, I am forced to think strategically. What work has the best chance of getting into CHI? This means, that from the outset, before I even have thoroughly thought through my ideas, I have constricted myself to doing work that is based on expectations set forth by the CHI community. These are the expectations that lead me to pick projects that are 6-10 weeks in length, and are based around studying interaction techniques, asking interesting research questions and validating it with a “pile of t-tests, chi-squares, and ANOVAs”, or interviewing 15-20 domain specific subjects for retrieving qualitative data (that often results in “self-evident” design implications). I am not trying to de-value this type of work or methodology, but am simply emphasizing that we are placing ourselves inside a box that is less about creative output and more about working a system.

As Sadat mentioned previously, ”at the end of the day, acceptance at CHI seems to hinge on a roll of a dice in getting 3-4 reviewers that agree with your methodological stance in assessing your contribution.” One astonishing pattern I have seen in my department is that a single HCI student will do several 6-10 week projects and submit them all to CHI, primarily to better their chances in order to get one paper accepted. I really liked the idea of, “restricting folks to listing 5 papers for tenure and maybe 2 for hiring a new PhD.”

Great discussion!

Rick Wash said...

James, I am quite sympathetic with your complaint. I find it really interesting because it is kind of the *opposite* of what I'm seeing in my section of CHI. I'm more in the CSCW end of CHI, and we have the same problem with reviewing: reviewers focus more on the methodology (what stats did you do?) than on the contribution in determining their ratings. But this problem yields different results here: too many papers are interesting "first steps" in a project that never end up with second steps. People look at Facebook, or Wikipedia, or something and come up with maybe one or two interesting ideas. But they never follow up on them, and move on to the next "new hotness" website. Full projects that follow up, taking the idea, testing it, putting it in a system, figuring out when it doesn't work, etc. don't happen. In systems, you can only get a "finished, multi-person-year" project published; in the social parts of CHI, you can publish "first steps" without ever getting around to doing the rest of the project to make that information useful.

As a first-time CHI AC, i"m having a lot of trouble getting reviewers to even discuss "contribution" of a paper. Everyone focuses too easily on the pieces that are easy to pick apart -- the stats, confounds in the evaluation, the lack of a pre-study, etc. But even when prompted to discuss contribution, people are very hesitant. They still prefer to fall back on "this is a flaw' rhetoric, rather than focusing on "what does that flaw mean for what the paper is teaching us" question. I think it is partially because focusing on "flaws" is easy, but I think at least as important, people feel like they shouldn't try to judge the "contribution" of papers. Saying "this paper makes a good, valuable contribution" feels like going out on a limb in a way that "this paper has flaws X,Y,Z" doesn't. And it is this reluctance that is causing the community to focus on little flaws, and requiring the authors to do more and more work to justify a paper.

I like the suggestion that was made to specifically prompt people to discuss the "contribution" of the paper, or what they liked about the paper. BUT, the form already includes a box for "describe the contribution"; I'm quite sure that the text in that box is almost always shorter than the text in the box for the rest of the review. I don't think prompting people is enough; we need to give them a reason to take that box seriously. I just wish I knew how. I do think, however, that this is one problem that is endemic to "interdisciplinary" research like CHI (a reluctance to judge work different from one's own) that might follow you to new conferences. (Indeed, many of the current CHI spinoff conferences like UIST, CSCW also have it.)

Tao Ni said...

A major frustration comes from how many unprofessional, uncommitted, and irresponsible reviewers are out there. Many reviewers start with a tendency of _rejecting_ a paper, rather than _evaluating_ a paper. I think this leads to many reviews that only focus on flaws, rather than contributions. How many reviewers out there spend as much time on reviewing a CHI submission as on comprehending and appreciating a published CHI paper? Just look at how many reviewers put aside the duty until the deadline. Maybe most only spend a hour on a paper before throwing their critiques. With such a short time, of course, it's easier to find flaws than really interpreting what the authors want to contribute.

John F. Patterson said...

I share the feeling that Systems work is being slighted and harmed by the "value system" that CHI/CSCW foster. People with creative ideas are distorting their work to accommodate the perceived need for an empirical study. If you will indulge me, I will offer a story, a suspicion, and a suggestion.

The story comes from my days as an AC for CSCW. There was this one paper with a truly innovative system that was prompting many to accept it (including the psychologists.) But then, it also had a study that was truly dreadful. In the end, we could not accept it because the study was embarrassingly bad and not something that we wanted associated with the conference.

I have always thought that the authors would have been better off simply putting their system in front of a few people and describing what happened in a very informal way. If they had done that, the paper probably would have been accepted for its novelty. By stretching themselves to do an empirical evaluation, they brought a new set of evaluative criteria to bear, which they did not adequately understand.

So, why do so many of us feel we need to do a study along with the system?

My suspicion is that it is actually less critical that there be a study than that there be evidence that the system was deployed and used. Most of us (including me) have developed systems that achieve demo status, but cannot quite be left in the hands of users. This might be done because the missing pieces are well understood and will soon be implemented. Just as easily, however, there might be conceptual flaws that have been left unaddressed.

It is an imperfect hurdle to impose, but as a reviewer, I feel comforted by knowing that the system under review has stood the test of users. Without that test, I feel like I am asking myself not only whether the system is interesting, but also whether the authors really know how to implement it. With the test, I relax a little on the second question.

Of course, once there are users involved, our report of a new system comes under the watchful eye of psychologists (which I am, in case it's important.) Now it seems no longer sufficient to simply provide descriptive results. There needs to be an hypothesis, a comparison, and all the rest.

All that machinery has its place, but perhaps the first outing of a novel system is not the place. Could we, perhaps, consider it sufficient that the first use of a novel system be described in merely descriptive terms? We would still have users, but the results would simply indicate whether they found the system difficult, useful, fun, or whatever.

OK. It's time for the proposal.

Could the psychologists in our community develop a questionnaire to administer to the users of a novel system? Let's call it the CHI Early Deployment User Questionnaire, CEDUQ for short. (I will happily yield naming to someone who is more clever.) It would be a very general questionnaire that could be applied in almost any early deployment. A novel system would be made available to a collection of users, who would be asked to complete the CEDUQ after using it. Then, the system paper could be submitted with the descriptive results of the CEDUQ incorporated. As far as I am concerned the CEDEUQ results could be in an appendix and only lightly discussed. Indeed, should the authors try to draw conclusions from the CEDUQ, they would do so at their own peril. The real purpose of the CEDUQ is to offer confirmation that the system has achieved a critical test hurdle, i.e., that it can be placed in front of users.

Good idea or bad idea? I'm not sure yet. Perhaps if it is too loose for CHI and CSCW, James will want to think about it for his new venue.

Jakob Bardram said...

This is indeed a very welcome thread - and an important one judging by the length of it. I'm one of the CHI/UIST/CSCW/Ubicomp systems builder and sometimes feel I'm in the same "trap". Here are a few reflection.

1. The idea of only using 2-5 papers in a tenure case is actually practiced here in Denmark and it works well. You submit your best work for the evaluation.

2. As a co-PC chair for CSCW 2011, we are strongly encouraging more systems paper!!! The problem is that we see less and less of them. So - everybody, please submit your systems paper to CSCW in 2011!

3. As a general co-chair for Ubicomp 2010, we are also looking for more systems papers and DEMOs! And we really would like the acceptance rate to go up a bit since we share the same concerns as expressed by Jonathan (Grudin).

4. We should start building systems together - James (and others). I do a lot of work w. medical and biology researchers, and they are very good at working together on very large projects. This also means that the author list for e.g. a Nature or Lancet paper gets rather long. But the paper also covers a lot of research.

And - I agree; "we are CHI" and hence need to fix the problem ourselves.

Jonathan Grudin said...

One more comment before we hurry to adopt John's charming "CEDUQ and tell us if it quacks" idea. An alternative explanation for our observations could be examined empirically.

We hear: Our systems people feel their papers are discriminated against. Our qualitative researchers feel their papers are. Designers feel that good design papers are excluded. Practitioners feel their papers can't get in.

Is any of this true? There are probably MORE papers in EACH category getting in. In 1982, 75 papers were presented and in 2003, 75 papers were presented. In the two decades between the median was 59. But with more submissions and rising acceptance rates starting in 2004, the dam broke and last year 277 papers and notes were accepted. A four-fold increase probably floated all the boats.

If so, our problem may be that we see that 75%-80% of the work in our category is rejected including stuff we like. Well, yes, and same with everyone else. For the program committee to discuss potential positive contributions might be fun, but their principle task is the not-so-fun work of deciding which 80% of the papers will be rejected and how to break this news to unhappy people, many of whom just had the bad luck to get a less-than-ethusiastic AC.

If more systems, design, qual, usability, and other papers are accepted now than before, which seems likely, plausibly not much novel work appears in each category, because a paper on an established topic can draw on established research rationales and reference lists.

In my experience, when acceptance rates rise to 45% reviewers can start flipping the bit to focus on what is interesting in a paper, although many now have engrained nit-spotting habits. You could still have Best Paper Nominations that pick out the top 20% to keep academic committees informed of the best of the best. But unless CS as a whole shifted, this wouldn't be practical for CHI I'm afraid. And if CS as a whole shifted it might also stress the ACM Digital Library folks, who benefit from hosting highly selective conferences. Eventually I think we will find a new path forward.

Manuel A. Pérez-Quiñones said...

James (and the rest of the community), great post. We need to keep making noise on this. I published several papers at CHI back when I was a grad student but have had 0 luck since. The reviews are always odd, strange, and by far irrational. I keep trying every couple of years, foolishly thinking that it will change.

Last year I had a paper rejected based on two main arguments. First, I failed to cite a paper that was not published yet. The paper was to appear 6 months later in a journal. The only way I found the paper was googling the tittle of the paper as suggested by the reviewer. Now I have to be clairvoyant to publish research.

The second comment was that I discuss research area X in my findings but it was not discussed in the previous work. I searched my paper and that topic was mentioned only in one paragraph of the paper. That one paragraph was clearly pointing out "an unexpected finding"... something that was not in our sights and was unexpected. One paragraph out of a 10 page two column paper. The reviewer was offended that I didn't cite more work on that area.

In both cases, my take away message was that I clearly did not stroke the ego of someone out there the proper way. That was all it took to get my paper rejected. Don't mind that I could have added 2 sentences and addressed both of these comments. They did not care that the methodology and the findings were all strong and well liked by the majority of the reviewers.

In my opinion, CHI reviews are no different than a popularity contest in high school.

Chinmay Kulkarni said...

While we're at it, why not be adventurous? Do we really need a conference that has a rigid and anoynmous reviewing process?

Here's my admittedly naive system:

1. make it a popularity contest (as someone suggested CHI already is)? Authors simply upload their "work" online and folks vote on it.

2. Voting is *non-anoynmous* and has a textbox which asks "Why do you like this idea?". IMO this partially solves the "Oh, they rejected my paper because I didn't stroke someone's ego right" problem: you can no longer hide behind anoynmity and let your ego do the reviewing (I'm not saying this ever happens: but by making it impossible, we encourage people to look for the *real* reason why their work was unappreciated).

3. Not every vote is equal. Your credibility as a voter who can distinguish good work from bad is determined by a pagerank-like algorithm. This also addresses the "reviewer expertise" problem to some extent. No expertise? Low pagerank.

4. You simply choose the papers which have the highest number of normalized votes in a year and publish them in the proceedings.

Obviously, this may not be a system which will work in its current form. With improvements, it might- someone observed Techmeme has more innovation than a conference; could this be the way to reverse the trend?

Good idea? Bad idea?

Anonymous said...

Hi James, thanks a lot for speaking up for junior researchers like me. I want to say that I have a very similar experience this year (I only submitted one "HCI system" paper and one "novel interaction technique" paper though). Instead of posting another "me too" comment here, let me be a little bit more constructive and here are my two cents on why it's happening like this and how we might change it next time –

Possible cause – CHI reviewers are supposed to be domain experts, so previous publication records at CHI are usually necessary to become a reviewer (e.g. in the pcs volunteer center, each reviewer needs to provide both his/her previous publication record and previous reviewing record). It's clear that there were more tightly controlled, lab-based studies published at CHI in previous years than papers describing "HCI systems" deployed in the wild. As a result, the CHI reviewers' pool is biased towards researchers who are experts in lab-based controlled studies. Hence it is more likely to have experts in tightly controlled lab studies to evaluate submissions that report results from "deployments in the wild" rather than the other way around. Although we have subcommittees, the same reviewer pool is shared by all subcommittees.


Possible solutions? In addition to divide ACs into subcommittees, we may also divide the single reviewer pool into multiple reviewer pools. Each reviewer can volunteer in multiple pools, but he/she is only allowed to report himself/herself as "expert" in one pool by default. Assuming that the number of "HCI systems" submissions is not too big, things might become better if these submissions get reviewed by people who published system papers at CHI and appreciate the challenge and efforts in building systems that have been deployed in the real world. Of course all these suggestions are based on the assumption that "HCI systems" are welcome at CHI.

yardi said...

It might be helpful to have a repository of past CHI papers and their reviews as examples—especially examples of good reviews (good reviews, not necessarily good ratings) if authors of and reviewers of some papers are willing to make their contributions public.

I don’t know what the histogram of reviewer expertise and experience is for any given year of reviewing, but it seems like this would be a service to newcomers to the field. As far as I can tell, the best way to learn to write reviews is to submit papers and get reviews yourself, and to volunteer to review, but that cycle takes a few years, perhaps at the expense of the review process itself.

(I agree that everyone looking at a few examples might bias us towards whatever the particular content was of those reviews but probably no more than we’re already biased by our own biases, and a larger repository would overcome this.)

This doesn’t really solve the larger problems described here, but it seems from the senior commenters above that the community generally agrees on what a good review is, even if it doesn’t agree on what a good paper is. Some visibility and transparency in the process might help newcomers to CHI to learn a little faster. Some grad students could make the repository as SV service. :)

Antti Oulasvirta said...

I've been following this conversation with great interest--worth every minute spent. Thank you James for initiative and social courage.

I'm afraid that the problem of poor reviews is only a symptom of something more serious. I believe that the underlying cause is that we do not have a shared consensus of what good research is in our discipline. A shared view of "good science" is a precondition for consistent, high-quality reviews (in this argument, I'm inspired by work of Jeremy Birnholtz on high energy physicists at CERN).

But do we have consensus, or are the most foundational questions of our discipline still under discussion? Papers like Design Implications (Dourish, '06), Ethnography considered harmful (Crabtree et al., 09), Softening up hard science (Carroll & Campbell, '86), Three faces of HCI (Grudin, '06), Psychology as science of design (Carroll, '97), Let's stop pushing the envelope (Whittaker et al., '00)--just to mention a few of my favorites--reflect the fact that we do not have consensus on the subject of study, epistemology, ontology, criteria of progress etc.--i.e. the features that define any discipline.

Thus, the problem of underappreciation of diverse contributions at CHI cannot be solved by escaping to new journals or conferences. "Popularity contests," new subcommittees, modified acceptance rates, or rigorous review processes can only help us find a local maximum, but the global maximum will not be reached by these measures. Asking reviewers to do a better job is as impossible as asking a broken instrument to report reliable results. We need to fix the instrument.

Unfortunately, there is no easy solution in sight. When technology was simpler, solutions were simpler, and so were the demarcations that define our science. But now, as our subject of study is a moving target, we have failed to update our views of our science. I'm not a philosopher of science, but to me this seems like a state of anomaly in Popperian terms.

To work our way out of this cul de sac, we need to fight on two fronts simultaneously: 1) search for the local maximum by fine-tuning the conferences and 2) search for the global maximum by identifying the foundations of our discipline that everyone can agree on. This will strenghten the identity of the main conference, but may naturally also result in the divorce of groups who disagree. Many divorces have taken place in the past (CSCW, Ubicomp, HFES, UIST, MobileHCI, ...), but have they reached unique scientific identities or is part of the problem that the divorces were too hasty, leaving foundational questions unanswered. I believe that this continues to hamper our ability to see real differences among the many conferences we have.

Floyd said...

Funnily enough, I had started my rebuttal with a very (even the tag line was the same) argument, so once the blog was up, I included some of your comments. I am happy to share what I wrote, as others seemed to be interested as well:

=======
In 1968, Dijkstra wrote ‘Go To Statement Considered Harmful’, a critique of existing programming practices that eventually led the programming community to adopt structured programming. Since then, CHI titles such as ‘Usability evaluation considered harmful’ and ‘Ethnography considered harmful’ have included the phrase ‘considered harmful’ to signal a critical statement that advocates change. This rebuttal is written in that vein.

Recent debate over how we, as the CHI community, assess the use of novel system work that tries to contribute to our understanding of how people _will_ interact with technology in the future, rather than understanding the past -that is how people are currently using or have used existing technology- has emerged, e.g. [Greenberg & Buxton. Usability evaluation considered harmful (some of the time). CHI’08], [Kaye & Sengers. The Evolution of Evaluation. CHI’07] and most recently [http://dubfuture.blogspot.com/2009/11/i-give-up-on-chiuist.html]

We can sympathize with the sentiments in these essays, as we have the feeling that our reviewers have assessed our work in a way one would assess existing technology, rather than seeing it as an outlook into the future, what it means for the user experience and how we can create better designs for a better future. There is nothing wrong analyzing the use of existing technology; however, we advocate a different view when it comes to the review of the use of novel systems, as such studies are high-risk, and hence a different ‘hat’ is needed to be worn when reviewing, as the viewpoint needs to shift from the ‘now’ and ‘low-risk’, to the ‘future’ and ‘high-risk’.

From an author’s perspective, it appears as if the following review process was undertaken: The initial score was set as 5.0, and for every flaw identified in the study, a point was taken off, resulting in the final score. Such an approach favors studies that involve low-risk, and hence is more suitable for evaluation work on existing technology. Assessing work with this kind of process will result in less experimental work at CHI, hence discouraging high-risk work, and an emphasis is put on analyzing the status-quo, therefore limiting the impact CHI can have on shaping the future of technology. Such concerns have been previously expressed within the CHI community, see the references above, discussions at CHI (when low-risk work is presented) and in the many comments to James Landay’s blog post (URL above), including from key figures such as Grudin, Buxton, Kaye, Greenberg, ...

With this in mind, we would have rather liked to have seen reviews that have started at a score of 0.0, and added a value point for every new insight our work provided, every new thought it provoked, every creative spark it ignited, while at the same time acknowledging the limitations of studies that investigate novel systems.

[specifics emitted]

In sum, we would like to encourage a re-think in regards to the comments made by authors of the references and blog posts above, and suggest considering a re-read with our proposed approach of starting with a score of 0.0, adding points while assessing the work. Of course, we expect a different score using this method, but are also keen on hearing about suggestions for guidance for future work that we would like to present at CHI. We are looking forward to reading about the outcome of this rather unusual rebuttal, and appreciate your time in reading this, as we feel this is an important issue for CHI.

Jake said...

Great thread, everyone. I just wanted to add a note to suggest that the problem here is one of epistemology: when a reviewer reads a paper, he or she wants to know how to trust the results the writer is showing. For quant empirical papers, this trust is built on sound method and usually statistical results. The empirical epistemology, whether pointed at a novel interaction technique or not, is in CHI's blood. The problem then becomes when reviewers want EVERY paper to be forced into this square hole.

Research is meant to further knowledge. Knowledge may be "knowlege that" or "knowledge how." I think of a good systems paper as providing a demonstrational contribution: knowledge how to do something formerly impossible. To this end, I don't want to see a dummy experiment that shows what we already knew. Just forego the study completely and let the demonstrational contribution ride high on its own.

I'll note that even qual empirical work still does not enjoy a shared epistemology. And other contributions: demonstrational, theoretical, methodological, design, survey, and editorial (what others?) get varied often uninformed reactions from reviewers. (Non-empirical design contributions are especially abused when reviewers ask why an experiment wasn't run.)

In the end, we need to educate our students and each other about what constitutes a sound epistemology for each contribution type that CHI receives. We can't keep abusing non-quant-empirical work by applying the wrong epistemological lens.

Sean Gustafson said...

Am I the only who believes the CHI review process works incredibly well? The reviews I have received and those of others I have been privy to are usually exactly as I expect. Sure there may be variation among the reviewers but the combined score reconciled by the primary is spot on. The score is based on the perceived impact of the work - its interestingness - and confidence that the presenters will be able to get the message across.

If a paper gets a low score and the reviewers complain about nitpick issues it is purely because those reservations are much easier to elucidate than a general feeling of potential impact. No one is counting small problems to come up with the score - they already have a score and are just trying to justify it.

Read between the lines of the niceness. No one wants to tell anyone that what they are working on just isn't interesting and they should give up, so instead they pick apart the study and related work. They are allowing you to save face, submit to a lesser conference (or a venue with different measures of impact), and move onto something better.

Anon said...

Scenario:
- There is a generally known problem which has been approached in a few different ways but not to much overall avail.
- A researcher thinks they have an idea for an iPhone app which will help solve the problem.
- They build it and try it out with a few users who are not the ones with the problem but who are part of the process that has the problem.
- There aren't any practical examples as a result of this 'trial' (as one would expect).

The CHI submission:
- does not explain how decisions were made when building the app to try to help solve the problem.
- makes assumptions about the problem itself without justifying those assumptions.
- does not show how the app helps solve a realistic example of the problem.

Some would reject this work for one or both of two reasons. First, it doesn't look like "good research" but rather a pet project of building a tool which may or may not be of any use to anyone. Second, letting such work into CHI sets a higher bar for a legitimate attempt at trying to solve the problem in a later year without this work having helped the field in any way.

Others would say "it is a good idea to be working on and the researcher had to build the app" and accept it.

Which is closer to the right answer on whether to accept or reject the work?

Anonymous said...

critical begets critical

Reviewer #1 is reading a paper which is similar in method and type to a paper she submitted to CHI last year. This paper is almost as good as the paper she submitted last year. That paper had gotten all '3' ratings with comments about needing more subjects and more control. She decides that since this paper had those same issues, and wasn't as good in other ways that she will rate it a 2.5 overall.

How do you convince Reviewer #1 to be more generous to others than others were to her?

jmankoff said...

I've been thinking about this since you posted it, and I finally have a (different) concrete suggestion. I am attending becc right now, a conference in a field that is journal based. I've cine to the conclusion that our conference based model is the problem. I think our community needs to embrace abstracts and posters as a real way to contribute to meetings. Take the artificially definition of quality of our reviewers out of the equation. People will put their best foot forward even without rigorous peer review, that's socially appropriate and the work will be interesting, novel, and broader in nature if this succeeds.    Of course the work also needs to be published and getting our journals to speed that up would be a nice counterpart to my proposed solution. Taccess already has  a 6 month turn around I'd love (concretely) to see alt.chi support this shift.  

Keith said...

This is an important discussion about the health of CHI. I want to begin my comment by acknowledging that all the CHI reviewers and ACs are volunteers who donate their time and energy. There couldn't be a conference without them and I am grateful for their willingness to volunteer. That said, there is no place for condescending or derisive comments in written reviews. They are just out of place. ACs have the responsibility to maintain professionalism and there needs to be some mechanism for them to toss out bad reviews.

That said, from a manger's perspective there are several things about CHI that seem paradoxical. It's an applied field that is still so new that it is dominated by people who have chosen academic careers in research. CHI is also a field that has a much weaker theoretical base than many other technical fields, which increases the importance of analyzing application experience. My extension of Antti's suggestion is that we need to re-define what constitutes "good work" and "contribution" as the field matures. Reviewers who have not worked on software engineering projects may not have enough appreciation of how severe the constraints are on application projects. I would love to see well designed experiments to compare impact of new technology (not the confounded one asked of Jim). But few production managers are going to keep their job if they allocate funds to do the same application project twice in order to compare the results. Did I miss the definitive experiment that compared OOAD with earlier forms of programming on industrial strength problems before it was widely adopted? Applied researchers and engineers both need to learn how well new ideas actually worked in advanced applications, and that's going to be messy. But CHI can't realize its full potential impact as a field without being applied successfully in this kind of engineering project environment. My read on Bill's first comment is that by excluding engineering work CHI really risks marginalizing itself.

Software engineering projects also include key contributions from several other disciplines and are often led by program managers from other fields. A project can't really be understood without somehow describing these contributions. Reports about these applications are critically important to advance CHI but all these factors add an even bigger scope for authors who's reports are supposed to fit into a 10-page paper. It's a similar challenge for reviewers who are supposed such a wide scope.

In large part to address these kinds of issues the Engineering Community was established in CHI 2006. One important objective for the 2010 Engineering Community was to attract more submissions about CHI engineering research and application of CHI research to software engineering projects. A second objective was to increase the influence of reviewers who have the technical background, experience, and passion to review engineering work in CHI.

Based on this thread and other sources (including my own frustrations) I'd say we still have a long way to go on both objectives. But one positive step that I'd like to point out was the new, serious effort on case studies this year. Case studies now have up to 16 pages to allow for background, method description, project description, outcome, discussion, etc. They are achieved and featured as talks in sessions that are the peer of paper sessions in the program. There were a much smaller set of submissions about engineering case studies, which allowed them to reach more qualified reviewers.

I'll be curious to see whether case-studies can provide a good forum for early applications of research and integrative design projects. Also, during the hand-off meeting to the CHI 2011 committee I plan to raise the need for ACs to use the Communities to help recruit qualified reviewers for paper submissions about engineering work.

Anonymous said...

As a normal researcher, I think every one goes through such process of being rejected for his/her submission. My opinion is that if you have a good paper, it will always be publsihed somewhere. Therefore, making judgement on one conference because of being rejected does not help you improve the quality. You must accept the fact that every reviewer try to objective, but we are human being, we always are a bit subjective.

Azam Khan said...

James, sorry to ask such a simple question (maybe I missed this somewhere in the discussion) but why don't you just start a SigCHI Subcommittee on Systems? Desney would probably be cool with that.

The next logical step beyond CHI Subcommittees is CHI Symposia, so you would eventually (sooner if you pushed for it), have you own symposium without losing the CHI brand.

Don't get me wrong; starting a new thing is fun, though a little stressful. I have veered off the HCI track a little bit and just started the Symposium on Simulation for Architecture and Urban Design (www.simaud.org). Luckily, it is part of an existing conference so a majority of the work is already being done...

Larry Constantine said...

This a brave and brilliant post with many fine contributions. It obviously touched a sensitive and significant subject. I am first and foremost a practitioner, a working designer. For the most part, I don't do real research, incremental or otherwise, but I have been a persistent innovator for decades and believe that those of us on working at the coal face of IxD and design methods have something to offer the CHI community. However, there are certain topics and stances that are completely unpublishable because they violate or question the accepted canon and received "wisdom" of the field. Examples are alternatives to ethnography-based field inquiry (one reviewer said "there are no alternatives, it is the only acceptable method") and alternatives to user testing. One paper in part about what to do when you CANNOT do user testing (there are such situations in the real world but not, apparently, in the world of CHI referees) has been repeatedly rejected because the project did not do testing. Duh; that's the point.
Anonymous reviewing is also a mockery because only the referees are anonymous; the authors are almost invariably known to the reviewers. On one recent rejection, a reviewer even went to some lengths to track down and verify who the authors were, then criticized us for failing to anonymize. (We had actually followed the published rules to the letter.) I have nearly 200 published papers including some classics and widely cited works, yet I have never been able to get anything published at CHI. If the old pros like me are doomed, pity the poor young academics needing "quality" placements. The reviewing process seems to have become increasingly capricious, with reviews that can be almost completely disconnected from the content of the paper. Even high scores can be ignored, as in one recent paper that was recommended strongly for acceptance by the reviewers, but the chair didn't like it, so the reviews were ignored and their recommendations were overridden. The scientific community puts great faith (the correct word) in blind refereeing, but even there, studies in the sociology of science suggest the process is broken. I am not sure "crowd sourcing" the refereeing is the answer, but we certainly ought to be trying some alternatives. CHI's "relaxed anonymous" model is a step, but a step too small.

James A. Landay said...

Thanks for your comments Larry. I'd like to see how new approaches, like what you hint at, could be published at CHI. Your work has had such practical impact, so I especially appreciate the comment.

Ryan Schmidt said...

Excellent (and depressing) post, James. I do interactive systems research in computer graphics, where the word "system" is the kiss of death, well-known to indicate second-class work. I have had reviewers insist that I clearly mark my abstract with the scarlet word, lest any reader be confused and think they were looking at a "technique" or "framework". I had been considering switching over to UIST, but it seems like that might be a mistake!
"The problem" with my field seems very similar to what you are saying here about CHI and UIST. At SIGGRAPH this year, the lifetime award winners (Rob Cook and Michael Kass) used their acceptance talks to (if I may brazenly paraphrase) ask why SIGGRAPH has become so boring, and encourage reviewers to be more generous with risk-taking research. We even had our own version of your post, when Michael Ashikhmin "quit graphics" in 2006. There was an emergency session at SIGGRAPH that year, after which....nothing changed.
It seems like we have the same evaluation problem, too. SIGGRAPH reviewers have recently started cribbing from CHI, demanding that the kinds of studies they see in CHI "interaction-technique" papers be applied to new 3D modeling systems. Testing with real users seems to be a waste of time – it is preferable to do an absurd (but measurable) comparison in the lab.
"Sciencification" is the underlying issue, in my opinion. The consensus seems to be that computer science needs grow up and become a real science, and you can't be "doing science" unless you are measuring something. I was recently told that “if you don’t have some kind of evaluation metric, then you’re really just randomly sampling a high-dimensional space”. The implication was clear – find a squared error, or statistic, or just something, *anything*, that will make your paper easy to review in an hour or less. Or wise up and do something easier to evaluate.
Maybe I’m a pessimist, but I don’t think the system can be changed. The simple fact is that unless the field is shrinking, new researchers outnumber the old. Anyone graduating now was raised in the current system, where getting a job/grant/tenure means optimizing for paper count, and the best way to do that is to stick with safe, easy-to-review work. And guess who is going to be running the papers committee in a few years…
I think for most fields (that survive), there is an interesting part at the beginning, where it is small and dynamic, and then it gets big and dull, because we publish or perish, and the law of large numbers guarantees that the average paper in a big field is going to be…average. So, I vote for starting UISTSys. It will be small and interesting, at least for a while.
( this is long, but you might find it interesting: http://www.cs.utah.edu/~michael/leaving.html )

James A. Landay said...

Ryan,

Great comparisons to issues going on at SIGGRAPH! I appreciate your commentary here and would like to keep in touch about how to reform the system for systems work. :)

Jeff said...

Ryan, I love that "Sciencification". Is this, perhaps, a case of "be careful what you wish for"? Computer science in general has had a persistent complex about whether or not it's really a science (any field that has science in the title...). As a result, the field has arguably been steadily drifting from it's "invent something cool and see if people find it useful" roots (too engineering focused) to instead focus on measuring phenomena ("if we can measure it its science"). You could argue (depending on your definitions) that CHI, UIST, and SIGGRAPH are certainly more scientific now than 10 years ago. Of course, they're also (for those of a more systems-y, engineering bent) much less interesting.

I've personally begun to feel that a related problem with much of the research community is what could be termed "Capitalification" (to keep it in line with Sciencification). People seem much more interested in doing Science and Research these days than science and research. My dictionary defines "research" as "diligent and systematic inquiry or investigation into a subject in order to discover or revise facts, theories, applications, etc." My working definition of "Research" is "work that is likely to be accepted in an academic conference", and is characterized by strong differentiation between Research (can be published) and "advanced development" (which can be ground-breaking and innovative but is difficult if not impossible to publish). Companies like Apple, Google, Facebook, and Twitter do "advanced development" because even though they have a fundamental impact on our use of computers, they don't publish; never mind that the dictionary definition of research is arguably a strong fit to what they do. However, studying mockups that don't work beyond the lab and that no one uses for more than 30 minutes is Research because you can publish it. Similarly, a dictionary definition of science is "knowledge gained by systematic study", while Science appears to involve knowledge gained by running a laboratory study and doing some statistical analyses (Science appears to require p-values).

However, I suspect that to a large extent this problem is self-correcting. Conferences like CHI and SIGGRAPH are dependent on attendees, and academic research is dependent on funding. If conferences and Research become sufficiently irrelevant people will stop paying attention and resources will dry up, in which case Researchers will have to adapt.

I suspect that we may see a situation with parallels to newspapers. When the tools to build innovative software and/or hardware are easily and cheaply available and tools like blogs and open source software repositories make it easy to share those innovations, will people really remain reliant on walled gardens like conferences and journals for distributing their content?

Timo Ojala said...

Interesting discussion on "scientific novelty" vs "systems engineering" ...

Worth reading in this context:

Sharp, R. and Rehman, K. The 2005 UbiApp Workshop: What Makes Good Application-Led Research? IEEE Pervasive Computing, vol. 4, no. 3, pp. 80-82, 2005.

Same issue, although on a somewhat different playground of ubiquitous computing.

-Timppa

Jose Rojas said...

Just to bring a bit of comic relief to this really issue in CHI:

http://www.youtube.com/watch?v=-VRBWLpYCPY

Jose Rojas

James A. Landay said...

Ryan, interesting comments... The one issue I think you miss is this: "...will people really remain reliant on walled gardens like conferences and journals for distributing their content?" I don't think these are really going away. People will use blogs and other ways of distributing content ALSO. But as long as university exist in their current form, academic researchers (professors and students) will need to publish in prestigious, peer-reviewed venues. That system will be MUCH harder to change.

James A. Landay said...

Thanks Jose. A bunch of people had sent that video to me (and I had seen it online around the same time). Quite funny (if Hitler can be funny), but I didn't think to post it!

Ivan Poupyrev said...

I guess the discussion is over but I believe that starting new conference will not help as long as reviewing process is anonymous. Professionals should have courage to stand up by their opinions and defend them if needed, not hide behind the anonymity.

Just as a side note. Tech blogging community despises anonymous commentates as cowards and regards them trolls. So what does it say about CHI community?

James A. Landay said...

Ivan says: "starting [a] new conference will not help as long as reviewing process is anonymous. Professionals should have courage to stand up by their opinions and defend them if needed, not hide behind the anonymity."

I see your point of view, but on the other hand anonymity can help people say what they really think without social pressure to be overly nice.

I have been thinking more about this recently and believe that systems-oriented papers tend to be rather dense, cover a lot of ground, and simply have to leave some things out. The quick review journal model, that has several back and forths with the reviewers would be better at helping these papers get over the bar for publication as a subset of the necessary ambiguity is teased out of the manuscript. I think you won't have a major problem with this issue of "reviewers hiding".

As such, I'm getting ready to propose a new electronic journal to ACM on HCI Systems & Applications. The idea would be to link it with an existing conference (e.g., UIST) and have any papers accepted by the journal by 3 months before the conference appear at that conference.

Bo said...

Please, please, please call it Transaction on UIST so we can call it Twist. Please. I'm begging.

lieber@media.mit.edu said...

James,

Hear, hear. For those new to this debate, I stirred things up with my CHI 03 "alt.chi" presentation, "The Tyranny of Evaluation", http://web.media.mit.edu/~lieber/Misc/Tyranny-Evaluation.html. Comments on that original rant appreciated.

I wouldn't give up on either CHI or UIST, despite the frustration. If new venues appear that are more congenial to innovation, let's take them. But I do think that CHI/UIST has made an honest effort to at least listen to these concerns and respond. Witness that I was led to this blog post by the UIST review instructions! Right now, I think the CHI/UIST management understands the message better than the vast body of CHI/UIST reviewers, which accounts for why authors still get these kind of reviews. The conference committees still have to work with whatever they are submitted and whatever the reviewers say. But it still the committees that decide, so I encourage young innovators to keep at it and still submit papers and review papers, to give the committees material to push forward on making the conferences more innovative.

On concrete suggestions for conference structure, I'd make two. One, the OOPSLA (now SPLASH) has a section called Onward, explicitly for more speculative and less rigourous work. CHI could emulate this. Second, in my experience, reviewing works much better when reviewers choose the papers they want to review rather than having them distributed by a committee. The biggest source of incompetent and hostile reviews is when people are thrust into reviewing papers they don't want. AAAI has a reviewer bid system that works well.

Henry Lieberman

No one in particular said...

Hi, James. I'm coming very late to this discussion (I discovered it when I typed UIST 2011 into Google--it's the top hit) but thought I'd chip in a comment, since this is something I've thought about for a while.

When I carry out a small experiment like a Fitts' Law study of a novel interaction technique, I generally have a few goals in mind: I want to understand whether a given technique works; I want to understand its generality, *why* it works, at some level of abstraction; and I want to persuade others to take up either my findings or the technique itself.

When I build non-interactive systems and written about them for publication outside CHI venues (e.g., in AI or cognitive modeling), my goals are comparable, but with the main emphasis being on the first point, demonstrating that a system actually does what I claim it can do.

I've also built interactive systems. These are much harder to sell, as everyone agrees. Papers have been rejected because comparisons with existing systems in common use were too limited in scope, because performance improvements were too small or too localized, or even because we didn't carry out a summative evaluation, thinking that the novelty of the work would carry the paper, based on demonstrations. I've sometimes wished that, once we've demonstrated that a system basically works, reviewers would consider the question, "Does this system suggest a new direction for HCI?" Is it novel, is it plausible, does it have potential to change the way people think about interaction? These questions aren't easily answered, I think, in the usual way we review papers, even though some aspects, like novelty, are commonly part of our guidelines for reviewing.

One of the institutional barriers, mentioned briefly in one of Jonathan's comments, is that conferences like CHI and UIST are now considered archival. I suspect that this, along with other factors, leads to a reluctance to take risks--we might not want to accept formative ideas that could turn out to be misguided or even wrong (which we can't judge without an enormous amount of further effort). "Risky" papers can be published elsewhere (at workshops, in alt.chi, as extended abstracts, etc.), but I don't think they get nearly the attention that safer full papers get at a high-profile conference, and of course they don't count as much on a CV.

I'd like to see a venue that emphasized looking forward and taking risks. Quality control would be harder but still manageable, I think. Systems papers would mainly be to provide inspiration rather than a foundation for carrying out usability studies. Most of the usual evaluation work (how well does it perform, etc.) would be left for the future.

-- Rob St. Amant

Anonymous said...

Hi,

I definitely feel the same thing. I have over 100 papers. Each journal/Conf reviews I get are useless. Sometime bull****. I also review for CHI but I am from communications.
The *top* professors in our fields should stop publishing and putting their names.

Once someone is a professor, he should start looking at placing his work on the web rather than papers to claim his contributions.

BAN all full professors from publishing. Let them do only reviews.

Anonymous said...

Hi,

I definitely feel the same thing. I have over 100 papers. Each journal/Conf reviews I get are useless. Sometime bull****. I also review for CHI but I am from communications.
The *top* professors in our fields should stop publishing and putting their names.

Once someone is a professor, he should start looking at placing his work on the web rather than papers to claim his contributions.

BAN all full professors from publishing. Let them do only reviews.

Krystian Samp said...

My understanding is that CHI is a different thing that it claims to be and consequently different thing that many people expect. CHI attempts to be a venue for researchers and practitioners. But the majority of the reviewers are researchers who favor rigorous experimentation (atomic task, simplified settings, statistical analysis). And I would argue that rigorous experimentation is a good thing and fundamental role of research. The problem is that ‘practitioners’ part of CHI is only illusion. The same reviewers judge design or system papers. But these papers are not research or scientific work. And as such they should be judged on different basis. Practitioners value exploration and generation of ideas, learning about new perspectives, ideas that failed, how systems are designed and built etc.

As a by-product we have system papers including controlled experiments which make little sense but let the research reviewers find what they look for. This is not to say that controlled experiments are not useful in system papers.

Acceptance procedure at CHI is also questionable. It is common to get high and low scores and contradicting reviews. And PCs take an average from that. This is a blunder and irresponsible behavior. In such cases a PC should read the paper and use his expertise to take one of the sides. Otherwise what is the point of putting HCI experts as PCs?

Looking at rapidly increasing number of submissions and accepted papers at CHI I get a feeling that new venues with distinct objectives are necessary. In my opinion this would be a good thing for the community.

Anonymous said...

It's amazing that this post is over 2.5 years old, read by thousands and it still perfectly applies to the last round of UIST reviews. It's very sad.

De Rooms Brecht said...

As a fresh-of-the-boat phd researcher I was shocked by the CHI 2013 conference which I was attending for the first time. While the presentations are of good quality, the work behind them often seemd to be refurbished work from 5-10 years ago. Especially when I know how devestating the reviews are perceived in our university, I often hear phrases like: "make sure your evaluation is very strong" since they will immediately shoot you down. Yet I arrived there and it seemed as if a lot of the evaluations and related work research were done in such a way that it fit's the goal.

Even more striking is that everyone seems to like the work and there is almost no positive critical input. I was glad to see Bill Buxton step up at a certain presentation and tell the presenter that inaccuracies in research might cause problems for researchers coming after him.

Personally, CHI2013 completely took away my ideologic view on research and partly demotivated me. I can now see what it is all about, create your work in such a way that it targets a good conference and minimalize the work you spend on it to publish, publish, publish, publish.

I can now see why there are more and more similar applications, frameworks or research results are emerging which are less and less usable/bugfree/accurate. They were never meant to be used in the first place, they were meant for a one-shot paper.