Closed Bug 780831 Opened 12 years ago Closed 12 years ago

crash in libdvm.so@0x45... on JB

Categories

(Firefox for Android Graveyard :: General, defect)

15 Branch
ARM
Android
defect
Not set
critical

Tracking

(firefox17 wontfix, firefox18+ fixed, firefox19+ fixed, firefox20+ fixed, b2g18 fixed)

VERIFIED FIXED
Firefox 20
Tracking Status
firefox17 --- wontfix
firefox18 + fixed
firefox19 + fixed
firefox20 + fixed
b2g18 --- fixed

People

(Reporter: scoobidiver, Assigned: kats)

References

Details

(Keywords: crash, topcrash, Whiteboard: [native-crash][summary in comment 23])

Crash Data

Attachments

(2 files, 2 obsolete files)

It's #17 top crasher in 15.0b3 and #22 in 16.0a2 while only #485 in 14.0.1 and #93 in 17.0a1.

Signature 	libdvm.so@0x45dd0 More Reports Search
UUID	cad1fbe6-5d08-4521-b9fd-9242c2120807
Date Processed	2012-08-07 07:28:42
Uptime	964
Last Crash	16.2 minutes before submission
Install Age	3.0 days since version was first installed.
Install Time	2012-08-04 07:31:09
Product	FennecAndroid
Version	15.0
Build ID	20120731145644
Release Channel	beta
OS	Linux
OS Version	0.0.0 Linux 3.1.10-g52027f9 #1 SMP PREEMPT Thu Jun 28 16:19:26 PDT 2012 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0xdeadd00d
App Notes 	
AdapterVendorID: grouper, AdapterDeviceID: Nexus 7.
AdapterDescription: 'Model: 'Nexus 7', Product: 'nakasi', Manufacturer: 'asus', Hardware: 'grouper''.
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ 
asus Nexus 7
google/nakasi/grouper:4.1.1/JRO03D/402395:user/release-keys
Processor Notes 	CSignatureTool: No proper signature could be created because no good data for the crashing thread (24) was found
EMCheckCompatibility	True
Adapter Vendor ID	grouper
Adapter Device ID	Nexus 7

Frame 	Module 	Signature 	Source
0 	libdvm.so 	libdvm.so@0x45dd0

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libdvm.so%400x45dd0
It's #6 non-fixed top crasher in 15.0b5.
Keywords: topcrash
It's #145 top crasher in 15.0 and #19 in 16.0b1.
Keywords: topcrash
It seems to be a dupe of bug 782223 because one user hits both in 16.0b5.
It's #31 top crasher in 16.0.2, #3 in 17.0b4 (many dupes), and #51 in 18.0a2.
Keywords: topcrash
Crash Signature: [@ libdvm.so@0x45dd0] → [@ libdvm.so@0x45dd0] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a]
Depends on: 764756
Crash Signature: [@ libdvm.so@0x45dd0] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a] → [@ libdvm.so@0x45dd0] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab26] [@ data@app@org.mozilla.firefox_beta-…
Crash Signature: [@ libdvm.so@0x45dd0] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab26] [@ data@app@org.mozilla.firefox_beta-… → [@ libdvm.so@0x45dd0] [@ libdvm.so@0x45c90] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab26] [@ data@app@or…
Summary: crash in libdvm.so@0x45dd0 on JB → crash in libdvm.so@0x45... on JB
This now easily is the #1 topcrash on 17.0b7, and even though it's only a few reports so far, this also happens on 18 Aurora, so I expect it to be there for 18 beta as well. Right now, this most seems to happen with the Nexus 7, which now seems all to be Android 4.2 (but as seen in the history of this bug, we saw this with 4.1 just as much).

As it looks like the stack only has this libdvm frame, we really need STR here.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #5)
> This now easily is the #1 topcrash on 17.0b7, and even though it's only a
> few reports so far, this also happens on 18 Aurora, so I expect it to be
> there for 18 beta as well. Right now, this most seems to happen with the
> Nexus 7, which now seems all to be Android 4.2 (but as seen in the history
> of this bug, we saw this with 4.1 just as much).
> 
> As it looks like the stack only has this libdvm frame, we really need STR
> here.

Do we have any URLs associated with this crash?

kats - is there anything we can change to get a better stack here, or any recommendations for QA to try reproducing?
Assignee: nobody → bugmail.mozilla
Keywords: needURLs
QA Contact: aaron.train
There are some NSFW links in this set of URLs. 

http://pub.vn/diendan/threads/10151-The-Dark-Knight-Rises-Hiep-Si-Bong-Dem-Troi-Day-2012-HD-Online-PDVN.html
http://camfuze.com/jngyslate
http://www.gazeta.pl/0,0.html?pelna=tak
http://www.aq.com/play-now/
http://www.ea.com/nl/voetbal/fifa-ultimate-team
http://tfc.tv/News
http://seasonvar.ru/serial-2438-SHou_Luni_Tyunz-1-season.html
http://www.chicagotribune.com/news/local/breaking/chi-4yearold-boy-man-found-dead-in-yorkville-home-20121025,0,7883191.story?track=rss&utm_term=Chicago+Breaking+News&utm_content=News+from+the+Chicago+Tribune%27s+Breaking+News+Desk&utm_source=twitterfeed&u
http://www.hentaicrunch.com/kansen-5-the-daybreak-episode-1/
http://island.pigg.ameba.jp/main
http://edition.cnn.com/video/?hpt=hp_c3#/video/business/2012/11/06/pkg-lake-wall-street-us-election.cnn
http://www.gazillionaire.com/gazillionaire.php
http://www.ea.com/intl/football/fifa-ultimate-team
http://www.rojadirecta.me/goto/tv1.planetadeportesnba.info/
http://www.mtvu.com/music/go-radios-new-video-for-go-to-hell/
http://www.starwars.com/
http://m.channel4.com/4od/father-ted/series-3/3392012
http://www.worldstarhiphop.com/videos/video.php?v=wshhY5msT0ZoV64Gg1cT
http://www.zemtv.com/2012/10/26/8pm-with-fareeha-idrees-accountability-of-media-and-tv-anchors-26th-october-2012/
http://www.juegosflash.com/jeux.php?VIDJeux=1437
http://drhtv.tv/live-5140.html
http://authorlive.wiziq.com/aliveext/logintosession.aspx?SessionCode=nTS4BMb%2fbASFQ8ZHAcBkJQ%3d%3d
http://www.mangareader.net/546-42359-13/wolf-guy-ookami-no-monshou/chapter-29.html
http://it.tinychat.com/ricothc
http://www.ea.com/de/fussball/fifa-ultimate-team
http://www.free-tv-video-online.me/player/gorillavid.php?id=fpmaersmp4gq
https://play.google.com/store/apps/details?id=slide.cameraZoom&feature=featured-apps#?t=W251bGwsMSwyLDIwMywic2xpZGUuY2FtZXJhWm9vbSJd
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CCgQtwIwAg&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dz0s1YRj4GzM&ei=i8SHULmrHqW_0AGey4GwBg&usg=AFQjCNEAVUMHNvwkRQuoXre-cQK97XRl2A
http://www.out.com/entertainment/popnography/2012/11/06/watch-battlestar-galactica-blood-chrome-trailer
http://www.filipinochannels.net/watch/v-342378?title=TV%20PATROL%20-%20OCT.%2024,%202012
http://camfuze.com/liv1987?ticket=ST-5796346-JHG5CsvD9CkxwXy9XAyX-blog.camfuze.com
http://wetter.tagesschau.de/radarbilder/
https://www.google.com/search?q=pinoy+ako&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official
https://apps.facebook.com/avengersalliance/?fb_source=bookmark_apps&ref=bookmarks&count=69&fb_bmpos=2_69
https://www.finkmanufacturing.com/
http://www.google.com/search?um=1&hl=en&nomo=1&q=apple%20tablet&biw=1280&bih=629&ie=UTF-8&sa=N&tab=iw&ei=xF6JUO6uEonk8gTyuoHYBw
http://www.mangareader.net/546-51226-5/wolf-guy-ookami-no-monshou/chapter-60.html
http://newt7.adultadworld.com/jsc/z5/fm.html?n=607&c=5516&s=7864&d=15&w=1&h=1&p=7864&z=41364849
http://www.nfl.com/gamecenter/2012110100/2012/REG9/chiefs@chargers#menu=drivechart&tab=track
http://www.androidcentral.com/wordpress-android-updated-include-support-nexus-7-danish-language
http://www5e.biglobe.ne.jp/~aji/3min/10.html
http://atdhe1.net/
http://w001.bravely.jp/player
http://fadrax.enix.org/ren/kopc_25.swf
http://www.camfuze.com/smdgirl82787?ticket=ST-5796123-cWffYR5eaBPOg4IMyvRX-blog.camfuze.com
http://www.tecchannel.de/kommunikation/handy_pda/2034375/android_kalender_richtig_synchronisieren_mit_apps_widgets_tools/
http://www.webutation.net/go/review/free2doo.com
http://www.mangareader.net/546-44165-11/wolf-guy-ookami-no-monshou/chapter-33.html
http://www.swagbucks.com/p/games?gid=11
http://www.mangareader.net/546-45740-6/wolf-guy-ookami-no-monshou/chapter-39.html
http://www.sockshare.com/file/524894F366DF5EF8
http://yahoo-mbga.jp/stdgame/300103?_ref=aff%3Dyhm50103
http://abc.go.com/watch/once-upon-a-time/SH55126545/VD55245000/tallahassee
http://drhtv.tv/channel-2.html
http://camfuze.com/liv1987
http://www.break.com/usercontent/2008/10/tentacles-596801
https://www.spinandwin.com/wp-content/themes/spinwin/game_launch.php?gameid=4039
http://news.bbc.co.uk/2/hi/programmes/click_online/default.stm
http://www.freeandroidtv.com/play.php?channelId=929
http://yahoo-mbga.jp/game/12003026/play
http://www.channel5.com/shows/hatfields-mccoys/episodes/episode-2-458
http://camfuze.com/youdontloveme
http://www.funimation.com/psycho-pass/episode/nobody-knows-your-face/sub
http://www.chicagotribune.com/health/sc-fam-1030-extra-skinny-minnie-20121025,0,4303499.story
http://www.mirathewalkingdead.com/the-walking-dead-online-S3E04-Killer-Within.html
http://www.mangareader.net/546-31798-21/wolf-guy-ookami-no-monshou/chapter-14.html
http://www.coursesmart.com/go/offlinevideo
http://irokotv.com/video/2116/brides-war-2
https://www.spinandwin.com/wp-content/themes/spinwin/game_launch.php?gameid=6017
http://www.ea.com/uk/football/fifa-ultimate-team
http://www.worldstarhiphop.com/videos/video.php?v=wshhAYO40m4In2W83rLK
http://www.bet.com/video/sundaybest/season-5/highlights/sunday-best-511-highlight-s2.html
http://www.talkenglish.com/LessonDetails.aspx?ALID=2019
http://pocky.jp/enjoy/icon_generator/index.html
http://www.swagbucks.com/p/games
http://museums.bankofamerica.com/
http://www.itv.com/itvplayer/bychannel/?Filter=CITV
http://wly.efunfun.com/GameAccess/Login/gotogame.html?gc=wly&sc=wly30
http://www.pogo.com/games/poppitsprint#game
http://www.camfuze.com/wrenoir
http://www.the-saleroom.com/en-gb/auction-catalogues/capes-dunn/catalogue-id-2855799/live-bidding
http://seasonvar.ru/serial-5293-ZHemchuzhina_u_morya.html
http://pvblog.blog98.fc2.com/blog-entry-15061.html
https://www.google.com/search?q=c%C3%A1ch+ghi+b%C3%A0n+trong+pes13+ps3&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official
http://kultura.gazeta.pl/kultura/0,127932.html?pelna=tak
http://www.bbc.co.uk/sport/0/formula1/20199967
https://www.google.com/search?q=windows+8+malaysia+available+for+download&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official
https://www.google.com/search?q=how+to+install+lighttpd+on+nexus+7&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official
http://apps.facebook.com/jackpotpartycasino/index.php?appSource=1&state=5bf2baae6a4f3d5f59f0a7a784c03633&code=AQA3u0EPBte1E8a6uh14ACHWPqrZvTg3DR__eLAjv_YyjhvU5zREtybhunipSdsOEKQsbcL7nrQ0zd0eZHUGejkscKUxXlxsGnc0YCQVv4Cd9ICU8ZLKAVH5qt5akDJtLKGRho-EMPfltYahq
http://fenoxo.com/play/
http://www.zie.nl/video/auto/Dubbeltest-Audi-A3-vs.-Mercedes-Benz-A-klasse/m1nz273f2uys
Keywords: needURLs
Although the crashing thread has only one stack frame, most of the crashes have some useful-looking data in the other threads from the dump. The few I looked at all had a bunch of threads in libflashplayer, which combined with the discussion from bug 756068 makes me think this might be flash-related. However I would like to get the raw crash data and do some grep-analysis on it to see if this correlation holds up over a larger sample size of crash reports.
I agree there are a lot of Flash game/video sites on the list.
I grabbed the crash data dump for nov 19 at https://crash-analysis.mozilla.com/crash_analysis/20121119/ and ran a script [1] on it to get a count of how many times "libflashplayer" appears in the raw dump. Of the 241 crashes with the signature libdvm.so@0x45... there were 19 that did not mention libflashplayer. It's possible that these are some other unrelated crash but I took a look at a few of them (see [2] for examples) and they seem very similar so I'm hesitant to just discard them as unrelated.

[1]
(awk -F '\t' '/libdvm.so@0x45/ { print $3 }' 20121119-pub-crashdata.csv | while read url; do if [ ! -f ${url##*/} ]; then wget --no-check-certificate $url; fi; awk '/id="rawdump"/ { count=0; domatch=1 } /libflashplayer.so/ { if (domatch) count++ } END { print count }' ${url##*/} | xargs echo $url; done) | tee libflash.cnt

[2]
https://crash-stats.mozilla.com/report/index/24629d4a-4cdf-4f4d-b58d-82fe22121119
https://crash-stats.mozilla.com/report/index/7716c9a8-670e-49b0-9931-fa34e2121119
https://crash-stats.mozilla.com/report/index/a3a13def-56eb-48b4-959f-361952121119
https://crash-stats.mozilla.com/report/index/fa3f293b-7cef-49d5-b9ac-ba0512121119
Another theory I had from looking at a few of the crashes was that the stack on one of the threads was getting too large and possibly smashing other bits of memory. I poked around my nov 19 data set and found that 36/241 reports had no threads with more than 100 stack frames, so that doesn't seem to pan out either.
Crash Signature: data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab26] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab3e] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab3e] → data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab26] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab3e] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab3e] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf88a]…
There also does not appear to be any particular correlation between Fennec versions, devices, or uptimes. All the crashes are on Android API 16 or 17 (i.e. JellyBean) which is probably why Nexus 7 devices are over-represented in the crash stats. I will continue digging, but it could be helpful if somebody with a Nexus 7 or other JB device started visiting the list of URLs in comment 7 to see if any of them can reliably reproduce the crash.
Loaded all URL's in comment #7 on my Nexus 7 (Android 4.2.1) with Flash Player 11.1.for Android 4.0 (11.1.115.27), and without and was not able to reproduce any crash.
FYI, this is still the #1 topcrash on beta, now on 18.0b1 - is there any way we can get some traction there?
ASUS Transformer Pad TF300T
ASUS Transformer Pad TF700T
Acer A700
Asus Nexus 7
Samsung Nexus 10
Rockchip Android
Acer A700
Asus Transformer Prime TF201
Looking at comment 10 : 2 of the devices are Galaxy S III, 1 device is cynogenmod, the last is on 4.0.3 on asus Transformer Pad TF300T
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #14)
> FYI, this is still the #1 topcrash on beta, now on 18.0b1 - is there any way
> we can get some traction there?

Hopefully bug 815684 will get us some more actionable information.
Crash Signature: data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf88a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf88a] → data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf88a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf88a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf8e2] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8e2]
The first crashes with this patch applied have started trickling in. KaiRo grabbed the logcat dump for me from crash report https://crash-stats.mozilla.com/report/index/dfeec177-4353-4d41-b86b-0b4572121203; it is attached. It seems to indicate that we are leaking entries in the JNI local ref table, or something along those lines.
snorp, any idea which JNI functions might be leaking the FlashPaintSurface references? Does the flash player have its own JNI functions that we don't have source for?

Summary:
361 of java.lang.Class (4 unique instances)
127 of java.lang.String (99 unique instances)
1 of org.mozilla.fennec.App
23 of com.adobe.flashplayer.FlashPaintSurface (23 unique instances)
(In reply to Kartikaya Gupta (:kats) from comment #19)
> snorp, any idea which JNI functions might be leaking the FlashPaintSurface
> references? Does the flash player have its own JNI functions that we don't
> have source for?

Flash does create the FlashPaintSurface itself via JNI. If you recall bug 790198, you mention that the local ref for surface of course does need to be freed. At that point in time, I think we weren't sure if Flash was doing that or not. It seems that it's not?

> 
> Summary:
> 361 of java.lang.Class (4 unique instances)
> 127 of java.lang.String (99 unique instances)
> 1 of org.mozilla.fennec.App
> 23 of com.adobe.flashplayer.FlashPaintSurface (23 unique instances)
Just an update: after spending most of the day gdb'ing the plugin code I found at least one or two problems. One easily-reproducible problem is that if I load the page at http://tfc.tv/Episode/Details/33042, every time I tap on the flash thing at the top one java.lang.Class<com.adobe.flashplayer.SystemCapabilities> local ref gets leaked on the NS_MOUSE_BUTTON_UP handler. If I do this enough times the table will eventually fill up and OOM.

I also found that the one instance of org.mozilla.fennec.App that is leaked comes from the kJavaContext_ANPGetValue case block in nsNPAPIPlugin.cpp. This code was also touched in bug 790198 as snorp pointed out above, and I'm 99% sure that we should be using a global ref here instead of a local ref (and updating any other pieces of code accordingly). The flash plugin must keep this ref as an opaque pointer, and hands it back to us when we request it, but otherwise shouldn't be doing anything with it (how could it? it's a GeckoApp instance after all, and it knows nothing about GeckoApp instances).

There are still other things getting leaked that I haven't been able to nail down yet. I think basically any time we call into one of the pluginFunctions there is the potential for flash to leak local refs, and we should probably guard against this by using AutoLocalJNIFrames everywhere (more or less in the same places we use PluginDestructionGuards). I don't know what kind of performance impact this will have.
Found that the t->callback in PluginTimerCallback was leaking another two java.lang.Class<com.adobe.flashplayer.SystemCapabilities> on the page at http://tfc.tv/Episode/Details/33042 so that should definitely get an AutoLocalJNIFrame as well.
Adding josh for more help here. I have some questions:

1) what is the relationship between nsNPAPIPlugin and nsNPAPIPluginInstance? From code inspection it looks like there will be one instance of nsNPAPIPlugin for each type of plugin (e.g. flash) and one instance of nsNPAPIPluginInstance for each actual plugin object (e.g. a flash video or ad on a page). Can you confirm this?

2) Is a PluginDestructionGuard supposed to be created around any invocation of plugin code? If not, is there a comprehensive list of entry points to invoking plugin code?

As a summary of what's going on this bug: there are places where we call into flash (I found nsNPAPIPluginInstance::HandleEvent and PluginTimerCallback as specific examples where this happens) and the flash code that runs leaks some entries in the JNI local ref table. If we don't guard against this eventually the table fills up and dalvik aborts. I don't know all of the pieces of flash code that does this, so the safest assumption is that any time we call into flash this might happen. So I would like to guard all entry points into flash with an AutoLocalJNIFrame, but it seems like there's no easy way to identify all of the entry points. The use of PluginDestructionGuard seems like the closest match but it is missing in places like PluginTimerCallback and DelayedReleaseGCCallback so I'm hoping there's something better.
Flags: needinfo?(joshmoz)
(In reply to Kartikaya Gupta (:kats) from comment #23)

> 1) what is the relationship between nsNPAPIPlugin and nsNPAPIPluginInstance?
> From code inspection it looks like there will be one instance of
> nsNPAPIPlugin for each type of plugin (e.g. flash) and one instance of
> nsNPAPIPluginInstance for each actual plugin object (e.g. a flash video or
> ad on a page). Can you confirm this?

Yes. There will be one nsNPAPIPlugin for Flash, and one nsNPAPIPluginInstance per instance of Flash.

> 2) Is a PluginDestructionGuard supposed to be created around any invocation
> of plugin code? If not, is there a comprehensive list of entry points to
> invoking plugin code?

I don't recall the details of this, I can look at the code today and let you know if you need me to. I've cc'd John Schoenick, who may know more about this. Same for Benjamin Smedberg.
Flags: needinfo?(joshmoz)
(In reply to Josh Aas (Mozilla Corporation) from comment #24)
> I don't recall the details of this, I can look at the code today and let you
> know if you need me to. 

That would be very helpful, thanks!
Whiteboard: [native-crash] → [native-crash][summary in comment 23]
Crash Signature: data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8e2] → data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8e2] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf8c6] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8c6]
Any updates here Josh? We're in the end run of Firefox 18, and coming up on the holidays, so traction on code inspection for this top crasher would be really helpful. Thanks!
Flags: needinfo?(joshmoz)
Crash Signature: data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8e2] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf8c6] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8c6] → data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8e2] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf8c6] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf8c6] [@ data@app@org.mozilla.firefox-1.apk@classes.dex@0xaab3a] [@ …
PluginDestructionGuard is used in many places where we aren't calling into the plugin, which iirc is the case you're interested in.

The bulk of our calls into the plugin are in nsNPAPIPluginInstance.cpp - if you look for PluginDestructionGuard just in that file you'll see most of them:

nsNPAPIPluginInstance::HandleEvent
nsNPAPIPluginInstance::GetValueFromPlugin
nsNPAPIPluginInstance::Start
nsNPAPIPluginInstance::SetWindow
nsNPAPIPluginInstance::Print

You can look at the code that actually calls into the plugin and search for it in other places if you need to be more thorough.

Please flag me for review on a patch.
Flags: needinfo?(joshmoz)
This one was our fault; we provided a local ref to the plugin as the java context - this seems to never be deleted and gets leaked. Under some conditions I think this might actually also result in memory leaks in Fennec (e.g. if the infamous "don't keep activities" developer option is checked) but I haven't tested that.
Requesting feedback here - I'm not sure what exactly the PluginLibrary functions do, but I noticed that there was an existing AutoLocalJNIFrame wrapper around the library->NPP_New call in ::Start() so I assume that other PluginLibrary functions can also invoke the plugin code. Therefore I ended up wrapping all of them.

Also I don't know what threads these might run on; so far I'm assuming they all run on the main thread but if any can run on other threads (maybe the I/O notification entry points in nsNPAPIPluginStreamListener.cpp?) then I will need to ensure that the right JNIEnv object is passed in to the AutoLocalJNIFrame.

Other than that this patch needs a whole lot of testing.
Attachment #690751 - Flags: feedback?(joshmoz)
Keywords: qawanted
Ping. Would like feedback from somebody (anybody!) who knows the plugin code and answer the questions in comment 29.
Flags: needinfo?(benjamin)
I don't know the android-specific details at all.
Flags: needinfo?(benjamin)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #29)
> Created attachment 690751 [details] [diff] [review]
> WIP - (2/2) - Guard all plugin entry points with a AutoLocalJNIFrame
> 
> Requesting feedback here - I'm not sure what exactly the PluginLibrary
> functions do, but I noticed that there was an existing AutoLocalJNIFrame
> wrapper around the library->NPP_New call in ::Start() so I assume that other
> PluginLibrary functions can also invoke the plugin code. Therefore I ended
> up wrapping all of them.
> 
> Also I don't know what threads these might run on; so far I'm assuming they
> all run on the main thread but if any can run on other threads (maybe the
> I/O notification entry points in nsNPAPIPluginStreamListener.cpp?) then I
> will need to ensure that the right JNIEnv object is passed in to the
> AutoLocalJNIFrame.
> 
> Other than that this patch needs a whole lot of testing.

My understanding is that everything happens on the main thread, including I/O. So I think this patch should be alright. I'll test it some today. Does it for sure fix the local ref leaks, though?
I also need to know about the PluginLibrary functions. If I don't need to guard the PluginLibrary functions then the patch can be much simpler because I can move the AutoLocalJNIFrame into the NS_TRY_SAFE_CALL* macros and deal with the couple of exceptions manually.

It for sure fixes the local ref leaks that I could reproduce using the STR in comments 21 and 22. I assume that many other leaks are caused by similar issues and will also be fixed by this, but I don't know that for sure. My expectation is that crash volume will drop significantly with these patches but not go away entirely, because there are probably other (non-plugin) pieces of code that also leak refs.
I see at least a few PluginLibrary calls in nsPluginHost, so maybe we should just guard in PluginPRLibrary directly.
Comment on attachment 690749 [details] [diff] [review]
WIP - (1/2) - Make the java context a global ref instead of a local ref

>       LOG("get context");
>+      nsNPAPIPluginInstance *inst = (nsNPAPIPluginInstance *) (npp ? npp->ndata : nullptr);
>+      if (!inst)
>+        return NPERR_GENERIC_ERROR;

This patch doesn't actually work right because npp comes in as null here. Also this gets called only once in Fennec's lifetime rather than once per flash instance like I thought. So this patch is actually not right at all. I'm just going to obsolete it; the one local ref leaked here shouldn't be a huge deal.
Attachment #690749 - Attachment is obsolete: true
I updated the patch to move the guards into the NS_TRY_SAFE_CALL* macros and the few places in PluginPRLibrary that seem to invoke callback functions into the plugin code. Tested on Fennec to make sure all is still good since the original version of the patch, and it seems to be. Pushed to try to make sure it builds everywhere [1]. I had an earlier version [2] that I ran all the tests on and it was greenish so I'm not gonna do that again in the interests of saving try resources.

[1] https://tbpl.mozilla.org/?tree=Try&rev=86933ea01366
[2] https://tbpl.mozilla.org/?tree=Try&rev=f0068d981b62
Attachment #690751 - Attachment is obsolete: true
Attachment #690751 - Flags: feedback?(joshmoz)
Attachment #691505 - Flags: review?(snorp)
Attachment #691505 - Flags: feedback?(joshmoz)
Re-pushed to try, this time not on top of an already busted cset.

https://tbpl.mozilla.org/?tree=Try&rev=5b7e91c5aa5b
Comment on attachment 691505 [details] [diff] [review]
Guard against plugins leaking local refs

Review of attachment 691505 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me
Attachment #691505 - Flags: review?(snorp) → review+
Comment on attachment 691505 [details] [diff] [review]
Guard against plugins leaking local refs

[Approval Request Comment]
Bug caused by (feature/regressing bug #): none in particular
User impact if declined: fennec crashes after viewing a bunch of flash content. it may take a while for this to manifest depending on how often the leaky code is called, and which flash instances invoke it. note that i'm not sure how many of the crashes will actually be fixed by this patch.
Testing completed (on m-c, etc.): locally. would like to bake it on nightly for as long as possible before uplifting for beta 5.
Risk to taking this patch (and alternatives if risky): patch affects android only. I would say it's somewhat risky because I don't fully understand when some of these code paths get executed and I was only able to exercise some of them. In particular there may be regressions if the threading model is not what I expected it to be. However I think much of the risk can be mitigated just by adequate testing - the best way to test this would be to visit lots of pages with flash content (preferably without restarting the browser in between) and generally interact with flash as much as possible.
String or UUID changes made by this patch: none
Attachment #691505 - Flags: approval-mozilla-beta?
Attachment #691505 - Flags: approval-mozilla-aurora?
As a note to myself in the future: there appear to be another set of entry points into flash code in the JS wrapper object goop. For instance, the the crash stack at https://crash-stats.mozilla.com/report/index/4e8ea89f-55b2-4d00-9776-cab032121201 contains the following frames:

52  libflashplayer.so  libflashplayer.so@0x53b60b 	
53  libxul.so          mozilla::plugins::parent::_releaseobject nsNPAPIPlugin.cpp:1448
54  libxul.so          DelayedReleaseGCCallback 	        nsJSNPRuntime.cpp:213
55  libxul.so          XPCJSRuntime::GCCallback 	        XPCJSRuntime.cpp:758

which indicates that the GC is calling into flash code via the NPObject->NPClass->deallocate code. Hopefully those code paths aren't leaking JNI refs but that might be a place to look if this problem persists.
https://hg.mozilla.org/mozilla-central/rev/8435a5715fa7
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 20
Attachment #691505 - Flags: feedback?(joshmoz) → feedback+
Comment on attachment 691505 [details] [diff] [review]
Guard against plugins leaking local refs

Approving for Aurora in preparation for uplift to Beta on Tuesday pending positive tester/QA feedback.
Attachment #691505 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Crash Signature: data@app@org.mozilla.firefox-2.apk@classes.dex@0xaab3a] → data@app@org.mozilla.firefox-2.apk@classes.dex@0xaab3a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaf902] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaf902]
Thanks for landing the patch for me! :)
Due to the low amount of data and the constantly changing binaries, it's pretty hard to verify this on Nightly or Aurora, but at least I haven't seen any crashes with those signatures since the patch landed.

I think we're good to go for taking this on beta as well.
Comment on attachment 691505 [details] [diff] [review]
Guard against plugins leaking local refs

Considering feedback in comment 46, approving the patch for beta
Attachment #691505 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
The amount of those crashes is down on 18.0b5, but unfortunately we still have a good number of crashes with the libdvm.so@45c90 signature, see https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2012-12-27&signature=libdvm.so%400x45c90&version=FennecAndroid%3A18.0b5
Yeah, I'm not sure why those crashes are still happening on beta (and to a lesser extent, on aurora). On nightly it looks like those crashes have stopped completely since I landed this patch. The logcat-dump-to-crash-reports patch is only in nightly so far so I can't look at the logcat of the crashes in beta/aurora to diagnose them. :( It's also too late to really consider speculative fixes for 18.
Marking this as verified for the reduction. I will open a new bug for the remaining crashes.
Status: RESOLVED → VERIFIED
Blocks: 825996
Restrict Comments: true
Product: Firefox for Android → Firefox for Android Graveyard

Removing steps-wanted keyword because this bug has been resolved.

Keywords: steps-wanted

Removing steps-wanted keyword because this bug has been resolved.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: