2 Aug status
12 July status
- FlashLight semi-work (no mouse/keyboard)
- NoVNC works (performance low, occasional screen pixel mess)
- QEmu/KVM as backend (works)
- Virtualbox, vmware, xen (ToDo)
Crowdsourcing for GUI testing
Research angle:
Facilitating weekly automated human software testing for $40 per week
Advantages:
- Automated
- Robustness
- Cheap
Human side:
- Can software testing exploit crowdsourcing?
- 10 workers test software for 30minutes (0.5 x $4/hour x 20people = $40)
- worker job completion
- Worker attention span
- do workers read screen instructions?
- Returning weekly mturkers?
- How much to pay them
- Reputation of weekly returning MTurkers (Martha)
Technology:
- MTurkers use any browser
- MTurkers do not have HTML5 probably
- Mouse lag (always latency)
- VNC, phpvirtualbox, Flash, JavaScript
- Fraud detection
- check if task was completed (downloaded file, etc)
- multiple workers, consistency
- crash capture and logging
- Multiplatform GUI testing (lot of engineering)
Approach:
- Try to get a first MTurk test by 15 June
- Test out various browser solutions
GOAL: Webpage with results of 5-10 different tests which are automagically MTurked weekly. Each test is green when all testers reported success, yellow if one MTurker encoutered a problem and red in other cases. Every test is clickable and shows for all MTurkers the complete log files of their work. Plus complete screen capture video of their screen/application activity during test. Tests:
- click on network buzz keyword and start download
- keyword search without suggest and start download
- keyword search suggest after typing "2011 " and start download
- Pauze and resume download
- subscribe channels
- Conduct tests with both empty megacache and 50k items megacache
Additional tasks:
- ToDo: family filter disabled, prevent users conduct both A and B test by making it a single HIT on MTurk
- Find success rate for various formulations
- "try to find out how to add something to your channel"
- "Locate the channel button and add something to Your Channel"
- "3rd formulation"
- A) try to understand the channel concept in Tribler B) discover where "your channel" is located C) add something to your channel
- Find success rate for various formulations
- " try to download a single file from a swarm
- "search for "blue suitcase", goto files tab, select the file "vodo.nfo", click the "download selected only button".
- A/B testing. Create two variants of Tribler and test success rate/task completion time.
- Search with and without the bundeling feature
- A: bundeling turned off/disabled
- B: bundeling turned on
- Training search queries: "blue suitcase" (simple:one result), "TED Bill Gates", "big buck bunny"
- A/B search tasks: "Ubuntu 11.04", "Pioneer One" episode 2, "Sintel", "the yes men fix the world",
- Measure task completion time+evolution over queries (from init, till start time of download!), variance within test population, 95percent significance?
- Conclude: inconclusive if this feature is good or not, but we've demonstrated that MTurk can be used for this sort of tasks
- NULL hypothesis: reject that it does not work. Benchmarking against classical method.
GUI usability testing
HYPOTHESIS: Both experienced and novice users of P2P technology don't read anything in the GUI
Tools: task completion time, replay the capture of user mouse-clicks + moves + GUI.
Danger1: task completion time noise: they are doing other background tasks; cancel measurements with non-moving mouse.
Test0: do they understand the search results page
Test1: do they click/understand the frontpage tags
Test2: do they spot the second+third column for bundeling results
Test3: Do they notice with bundeling that the first hit represents a sample? (they don't read the more)
|