July 03, 2009
An excellent dissertation
Here's a short, reasonably good report on a study of faculty opinions about what makes an excellent, good or unacceptable dissertation, from a book (Developing Quality Dissertations in the Social Sciences, B. E. Lovitt and E. L. Wert, Stylus Publishing, 2009).
Posted by jmm at 02:46 PM | Comments (0)
May 07, 2009
Content to contribute: Wikipedia
From time to time I find pages in my areas of professional knowledge that seriously need improvement. On my long to-do list, editing Wikipedia never seems to make it to the top. But I might as well start a list in case I am looking for something to do in the future, or better yet, to suggest as an exercise for graduate students in my area.
Today I noticed:
- Incentive compatibility: For example, the article says that there are different "types" of IC (dominant strategy, Bayes-Nash). These aren't different types of IC. IC is a constraint (or sometimes a desideratum), and one can impose it on problems which we solve under different rationality assumptions. (This isn't a very good statement either!) Also, Bayes-Nash is defined incorrectly (the definition given is for Nash more generally.)
- Strategyproof: This one is really dreadful. The concept is defined incorrectly at least once (and the mere fact that it is defined more than once in a single entry is not good): the claim is made that "strategyproof" is equivalent to incentive compatibility + individual rationality. NOT. Also, the rather absurd claim is made that the concept is "most natural to the theory of payment schemes for network routing". I can't even fathom what metric one might use to measure whether a concept is more or less "natural" in various settings, but in any case, it seems absurd on its face to privilege network routing applications over all other applications for which dominant strategy constructs (such as strategyproofness) are useful. I actually looked this one up because I heard someone use the concept incorrectly in a research presentation, and that reminded me that a careful definition for strategyproofness is rarely stated, though it is used quite often.
If you happen to pick up on one of these and do some editing, be sure to note it here!
Posted by jmm at 01:32 PM | Comments (2)
April 21, 2009
What makes good qualititative research?
The debate between "qualitative" and.... non-qualitative (it's not all "quantitative"!) research has been going on for many many years. Qualitative research includes ethnography and various methods based on interview and detailed field observation, often of a relatively small number of cases. Typically, qualitative research eschews the more traditional approach to scientific research, described by King, Keohane and Verba (Designing Social Inquiry (1994)):
start out with clear, theoretically anchored hypotheses, pick a sample that will let you test those ideas, and use a pre-specified method of systematic analysis to see if they are right.
Quals claim their work is underappreciated and underfunded; non-quals criticize qualitative work as "unrigorous, unreplicable, unfalsifiable" (John Comaroff, in Michèle Lamont and Patricia White, Workshop on Interdisciplinary Standards for Systematic Qualitative Research (Washington: National Science Foundation, 2009), available at http://www.nsf.gov/sbe/ses/soc/ISSQR_workshop_rpt.pdf, p. 37.)
Howard S. Becker, one of the leading qualitative sociologists, recently wrote an essay elucidating this debate, and offering some criteria for good qualitative research. He bases it on a review of two NSF reports released (one in March 2009) on the use of qualitative methods. (This is the same Becker known to many of as as the author of Writing for Social Scientists.)
I enjoyed reading this, as someone who has long struggled to understand what criteria are useful for judging whether qualitative research is "good" or not. What constitutes a contribution to knowledge? While Becker's criteria, unavoidably, are a bit, well, qualitative, he offers specific characteristics to look for, and I find his list convincing, at least as a set of necessary conditions, if not sufficient.
My main beef of comes down to this: Qualitative scholars often describe their work as "exploratory", and sometimes that it's purpose is to generate "grounded theory". I'm all for creative insights and hypothesizing. But how much of a contribution to knowledge is it -- especially if the hypotheses can't even stand alone as rigorously true logical deductions (which may be surprising and enlightening on their won) -- if no one ever follows up the exploratory hypothesis generation to actually test, with reliable methods, whether those hypotheses are more or less supported by sufficient, and sufficiently controlled evidence to change our priors?
Posted by jmm at 06:10 PM | Comments (0)
June 10, 2008
Olin Shivers's Dissertation Advice
My colleague, Brian Noble, pointed me to Olin Shivers's .
Shivers makes one main point, but makes it well: a thesis is an idea, and a dissertation is a document that supports your thesis. This clarifies a lot of thinking about the task, to wit,
You will know what things are essential, and what things are distractions or detours. You will know when to stop writing: when you have demonstrated your thesis. If your thesis committee makes unreasonable demands of you, you will be able to tell them: "(a) My thesis, as stated, is a solid advancement of the field, and (b) I have supported my thesis. This is all I need to do to graduate; your requests are above and beyond this threshold. Cancel them and give me my degree."and
A side benefit is that it provides an unassailable defense to an entire class of attacks on your work. For example, should someone attack your work by pointing out that it does not scale, you simply reply, "You may be correct, but right or wrong, your point is irrelevant. My thesis is that 'crossbreeding gerbils with hamsters provides an order of magnitude speedup over standard treadmill technology.' I clearly demonstrate factors of 12-17 in my dissertation; I make no claims beyond an order of magnitude." This is one of the benefits of focus.
In between he writes pithily about good writing.
You might also enjoy Shivers's advice on the night before your thesis defense.
Posted by jmm at 05:05 PM | Comments (0)
October 23, 2007
Sleep and scholarship
I have been chronically sleep-deprived since college. Perhaps as a consequence, I have become interested in sleep research over the years (and I have been diligent about trying to teach my kids good sleep hygiene!).
Not a lot is known about the role of sleep for cognitive activities, but much more is known than a couple of decades ago. What does this have to do with scholarship? Many research studies indicate that long-term memory formation, learning, complex skill performance, and creativity are strongly affected by sleep patterns.
A good place to start learning about sleep research is Stanford Professor William Dement's The Promise of Sleep. He explains the basic physiology of the sleep cycle and summarizes the state of sleep research (as of about 2000), with interesting results on memory, reaction time, learning, etc.
A lengthy article in today's New York Times reports on research by Dement, recent work by Prof. Matthew Walker at Berkeley, and others, on the role of sleep in learning and memory. For example, there is a large body of evidence now that the period of deep sleep that occurs relatively early during a normal night of sleep is crucial for encoding and strengthening declarative memory (like memorized facts).
Stage 2 sleep, on the other hand, which mostly occurs during the second half of the night, seems critical for mastering motor tasks (like playing the piano).
A story on LiveScience.com reports on other research by Walker showing that emotional responses to negative stimuli dramatically intensify in the sleep-deprived.
Po Bronson wrote another lengthy journalistic article summarizing research on sleep and learning in New York Magazine (2007).
One piece of suggestive evidence that I find particularly compelling (because of my passion for playing the piano): In his famous studies on deliberate practice and expertise acquisition, K. Ericsson and co-authors reported that the best violinists got measurably more sleep than good violinists and teachers, and also took more naps (1993).
Posted by jmm at 12:24 PM | Comments (0)
October 21, 2007
Drago Radev's skill list for Ph.D. students
My colleague Drago Radev (with help from his former student, not a graduate, Jahna Otterbacher), has compiled a list of skills Ph.D. students should develop before they complete their degree (some are specific to natural language processing or computational linguistics). As with many things Drago does, this rather takes my breath away, and I think I don't score well enough for him on many (despited being 21 years past my Ph.D.!)
The list is long, so...
Prof. Dragomir Radev's Advice for Ph.D. Students
(with contributions from Jahna Otterbacher...)
List of skills that a PhD student in NLP/CL should have by the time he or she gets a PhD.
A new student should be expected to score very low on most of these criteria while one about to graduate should get very high scores on almost all of them.
- ability to build evaluation pipelines and perform evaluations for new tasks
- ability to locate and read the relevant papers on a "new" problem
- ability to come up with "easy" and "reasonable" baselines
- ability to find, download, install, and run existing software from third parties
- familiarity with machine learning, graph theory, linear algebra, calculus, combinatorics, statistics, and text processing
- understanding of linguistic phenomena and annotation
- understanding the variability of human judgements
- ability to write good narratives of experiments
- ability to write good overviews of existing research
- ability to develop and give presentations
- ability to discuss research with other team members
- ability to see a problem or an approach from a very broad perspective
- ability to assess the feasibility of a problem or approach
- ability to plan a research project and execute it over time
- intuition to try alternative methods
- understanding of the relative advantages and drawbacks of general methods across problems
- ability to implement in code generic algorithms and to make appropriate modifications to them
- understanding of related sciences such as bioinformatics, artificial intelligence, etc.
- understanding of computational complexity
- understanding of the fundamental data structures and algorithms
- familiarity with the availability on the Web of relevant corpora, papers, and tools
- excellent understanding of UNIX, including process control, scripting, and backup
- ability to build web-based and local demonstration systems
- ability to describe one's research to others with different levels of overlap in backgrounds with the student's
- understanding of project management: CVS, documentation, modularization, portability of code
- knowledge of a number of programming languages: C/C++, Java, perl/python, matlab
- ability to plan one's time, esp. wrt. courses, travel, committees
- ability to read a paper and abstract its main points - both strenghts and weaknesses
- ability to draw charts, diagrams, screen snapshots, and other illustrations for papers
- ability to write quick scripts to convert data from one format to another
- ability to write quick scripts to test existing libraries or external software
- ability to write quick scripts to evaluate experiments
- ability to teach the introductory class, as well as plan it and grade it
- ability to relate one's work to similar problems in related research areas
- ability to store and retrieve data in a database systems
- ability to write interfaces to existing resources: both local and Web-based
- ability to network with colleagues
- ability to promote oneself
- ability to organize events: colloquia, external visits, etc.
- ability to build an end to end system
- ability to take initiative and to propose new projects
- ability to write proposals for funding
- ability to elicit assistance from advisers, fellow students, and others.
- ability to ask intelligent questions at talks
- ability to design and perform user studies
- ability to request and obtain IRB support for user studies
- knowledge of a range of research methods, and an ability to read and give feedback on colleagues' work (that is not necessarily in my own area of interest).
- ability to initiate collaboration with others.
- knowledge of people from whom he or she can ask and receive helpful feedback on my work.
- knowledge of research communities in which to become an active member, get good feedback on his or her work and get exposure of his or her work to others.
- awareness of his or her key strengths as a researcher and future teacher (for people with academic career aspirations). Learn how to emphasize his or her strengths and use them to have impact.
Posted by jmm at 01:42 AM | Comments (0)
October 02, 2007
Should scholars rely on Wikipedia?
As soon as Wikipedia achieved much critical mass, students began citing to it, and professionals and other writers have followed suit. Should research scholars rely on Wikipedia?
Neil Waters, a professor in the Department of History at Middlebury College, thinks that Wikipedia is a good place to get ideas, to get an initial introduction to a topic, or to get leads on references to pursue. He thinks students and scholars should not rely on it, however (that is, in scholarly currency, should not cite to it as a reliable source). He has published a short, cogent essay presenting his argument in the Communications of the ACM.
I agree with Waters. Indeed, Wikipedia agrees with Waters. This is not an attack on Wikipedia: it is a long-standing and general principle about not relying on (or citing to) tertiary sources in scholarly research, which includes all encyclopedias (even the venerable Britannica). The problems posed by Wikipedia are special, and of special concern, especially for less popular topics, but the principle is general.
One of Wikipedia's principles is "no original research", and all fact assertions are supposed to be documented by citations to primary or secondary sources. The latter guideline is followed only partially, but it is one of the quite useful features of Wikipedia for scholars: get an introduction to a topic, and then start following the references to more reliable source material.
Posted by jmm at 08:50 AM | Comments (0)
February 21, 2007
Should the digital revolution lower standards for truth?
Should students or scholars cite to Wikipedia as a reliable source? I admit that I have cited Wikipedia once or twice, though only to provide an informal definition and examples of a recent concept (for example, I recently pointed to it for emerging variants on spam such as spim, splog, spit, etc.).
The Middlebury College History Department has ruled that its
students may not cite Wikipedia in research papers or exams (via NY Times). This was prompted in part by six students who recently made the same error by relying on Wikipedia to study for a Japanese history exam.
My inclination is to agree. Rapidly decreasing costs of communications and computation gave us networked information resources, which provide much faster and cheaper access to vast quantities of information. A somewhat unexpected consequence has been that many people are confusing accessibility for reliability, and quote willy-nilly because "it's on the Internet". If more information is more readily available, wouldn't we expect to see people become more selective in picking sources? Certainly, I think that is what I think we teachers and scholars should promote: a higher, not a lower standard.
The leaders of the Wikipedia project do not apparently disagree. Founder Jimmy Wales is quoted in the NYT article as saying that students shouldn't rely on any encyclopedia as a citation for research. The following statement appears (at the moment!) on the meta-page Wikipedia:About,
While the overall [quality] trend is generally upward, it is important to use Wikipedia carefully if it is intended to be used as a research source, since individual articles will, by their nature, vary in standard and maturity.
Interestingly, one of the three core principles for Wikipedia content is that it be verifiable.
"Verifiable" in this context means that any reader should be able to check that material added to Wikipedia has already been published by a reliable source.
While, if scrupulously and professionally followed, this principle would ensure that we could rely on Wikipedia as a reliable source, I think the main point is different: every statement in Wikipedia, if correct, can be found in a more reliable source elsewhere. Careful students and scholars can search out the more reliable sources.
Indeed, many people I know (including me) advocate using Wikipedia primarily in this way: as an introduction or convenient overview of a topic, identifying facts or ideas that the scholar then verifies elsewhere, in more reliable sources.
Posted by jmm at 02:47 PM | Comments (0)
January 01, 2007
Calculating scholarly impact
Scholars and their employers have long wanted metrics for measuring the importance or impact of a scholar's research output. Citation counts have been used for years, often based on the citation indices published by ISI. Recently many have started doing citation counts using Google Scholar (GS). Judit Bar-Ilan wrote a scholarly article comparing ISI, GS and Citeseer.
Recently, there have been various attempts to create metrics that are more informative than merely counting citations. The current favorite seems to be the h-index, suggested in 2005 by Jorge E. Hirsch at the University of California, San Diego (An index to quantify an individual's scientific research output, arXiv:physics/0508025 v5 29 Sep 2005). The Wikipedia article has a good summary. Two others are the g-index (Leo Egghe, Theory and practice of the g-index, Scientometrics, Vol. 69, No 1 (2006), pp. 131-152) which gives more weight to highly cited articles, and the contemporary h-index (Antonis Sidiropoulos, Dimitrios Katsarow, and Yannis Manolopoulos in their paper Generalized h-index for disclosing latent facts in citation networks, arXiv:cs.DL/0607066 v1 13 Jul 2006), which is parameterized to weight recent articles more heavily.
Here is a web based h-index calculator (using citations from GS as its database). Note that any calculation is subject to error if the scholar's name is not unique; this tool provides a boolean keyword restrictor that offers an attempt to ameliorate this problem. And here you can download a software tool that calculates h-index, g-index and others. This tool reports all of the articles used for the count, so you can check to eliminate those by different authors.
Using both of these tools, my h-index is 24 (although two articles with 24 cites also appear with 1 more cite to a listing with a typo in the title: when combined, my h-index is 25, a small example of the errors automatic tools can make).
Posted by jmm at 03:12 PM
November 29, 2006
Feynman on Problem Solving
I thought the following passage by Richard Feynman was a nice statement of problem-driven learning. Don't read passively: try to figure out how to solve the problems yourself, using the book or article as a touchstone to check your ideas. (Feynman was one of the leading physicists of the last generation.)
In this quote, Feynman is initially referring to learning the basic theory of a computer as a set of commands that can perform operations. But the main point is how to learn from problem-solving.
Now there are two ways in which you can increase your understanding of these issues. One way is to remember the general ideas and then go home and try to figure out what commands you need and make sure you don't leave one out. Make the set shorter or longer for convenience and try to understand the tradeoffs by trying to do problems with your choice. This is the way I would do it because I have that kind of personality! It's the way I student -- to understand something by trying to work it out or, in other words, to understand something by creating it. Not creating it one hundred percent of course; but taking a hint as to which direction to go but not remembering the details. These you work out for yourself.The other way, which is also valuable, is to read carefully how someone else did it. I find the first method best for me, once I have understood the basic idea. If I get stuck I look at a book that tells me how someone else did it. I turn the pages and then I say 'Oh, I forgot that bit', then close the blook and caorry on. Finally, after you've figured out how to do it you read how they did it and find out how dumb your solution is and how much more clever and efficient theirs is.! But this way you understand the cleverness of their ideas and have a framework in which to think about the problem. When I start straight off to read someone else's solution I find it boring and uninteresting, with no way of putting the whole picture together. At least, that's the way it works for me!
Throughout the book, I will suggest some problems for you to play with. You might feel tempted to skip them. If they're too hard, fine. Some of them are pretty difficult! But you might skip them thinking that, well, they've probably already been done by somebody else; so what's the point? Well, of course they've been done! But so what? Do them for the fun of it. That's how to learn the knack of doing things when you have to do them.
Richard P. Feynman, Lectures on Computation (Perseus Publishing: Cambridge, MA), 1996, p. 15.
Posted by jmm at 10:04 AM | Comments (0)