Fixing drag and drop in Electron
Drag and drop from Electron apps to other applications is broken. One possible way to work around it is by writing native Node modules replacing Electron’s implementation. That way we can support dragging multiple files out of our app with full support for modifier keys. This article describes how to do this for Windows and MacOS.
But be warned, you have to want drag and drop really, really badly to go down this route. It is a lot of work, as it involves use of native Node modules, the Win32 and Cocoa APIs, and the C, C++ and Objective-C languagues. And in the end it’s still not quite perfect. If that’s not enough to deter you, do read on to embark on a journey to the heart of darkness.
Setting the scene
For a file browser like Fileside, drag and drop is a crucial core feature. It needs to support dragging of any selection of files and folders to other panes within its own window, as well as out to other applications running on the computer. Likewise, it needs to accept files dragged into it from external programs like Finder or Explorer.
Under the hood, Fileside uses GitHub’s Electron framework, which allows you to build cross-platform desktop applications using web technologies. Electron is essentially a Chromium browser instance which has been retro-fitted with Node.js and a few Electron-specific APIs for interacting with the desktop OS on which it runs.
Drag and drop in the browser
Dragging files around between different panes within the app can be adequately implemented using the constructs made available by the HTML5 drag and drop API.
To initiate a drag, we listen for the dragstart
event on a DOM element which has been marked as draggable
, in response to which we fill in its dataTransfer
property with information about the data to be dragged and what to display as a drag image.
To receive a drop, we register for the dragover
, dragenter
, dragleave
and drop
events on another DOM element. There is some ballroom dancing involved and a few gotchas to grok (the most annoying one being having to count the number of entries and exits in order to highlight a drop target under a dragged item), but since we only have one browser to worry about, this is manageable.
By design, a web browser does not allow drags to leave a web page and enter the world of the operating system underneath. Electron does extend Chromium with this functionality, in the form of the webContents.startDrag
function, but unfortunately it comes with some limitations.
What’s the problem?
Using Electron’s startDrag
, the following caveats apply:
- It’s only possible to drag one file at a time on Windows.
- It’s not possible to hold down modifier keys (Alt, Cmd etc) to control the type of the drag (copy, move, link etc.), and the mouse cursor always shows a plus icon.
This isn’t really going to cut it for an application dedicated to managing files.
To work around these limitations, we will take over the initiating side of a drag with our custom implementation of startDrag
. The receiving side, i.e. the drop handling, can remain as a standard HTML5 implementation.
At this point, I should mention that another option would be to fork the Electron codebase itself, and modify it to fit our needs. However, since this adds the extra burden of having to maintain our own custom version of Electron, it should really be seen as a last resort.
Going native
What does going native mean in the context of an Electron app? There are two different options depending on how native you want to go.
You can either use a foreign function interface (FFI) wrapper, obviating the need to get your hands dirty with C++ code, or write a C++ native Node module talking to the OS directly.
Diet native - Foreign function interface
A foreign function interface acts as a bridge from one language or runtime environment to another. The FFI approach ought to be the quicker route to native, hence it makes sense to try it first.
There are a few NPM packages that make native APIs available to Node applications through this technique. The two I’ve spent some time investigating are NodeRT for Windows and objc for Mac.
Out of the two, I’ve so far got a good impression of objc, despite it being very young and more or less experimental. I previously used it to access the MacOS Trash API successfully.
NodeRT, I found poorly documented, and quite a pain to get up and running. There are many hidden assumptions, strict version requirements, config files with hard-coded paths to particular Visual Studio installs etc. Extensive fiddling was required just to get it to build. In addition, the relatively new WinRT API whose functionality it exposes does not seem to be widely used, and its version of the drag and drop API is quite thin on documentation if you’re trying to use it outside of UWP.
Skipping this extra layer of somebody else’s code for the flexibility of the fully native approach seemed to be the most sensible way forward at this point.
Full fat native - C++ Node module
Time to roll up our sleeves and learn how to write native Node modules, or C++ addons as they are called in the Node documentation.
This article will not go into detail on the mechanics of native Node modules themselves, as there are already plenty of good tutorials available to get up and running. Here are a few that I found useful:
- Writing Native Node.js Modules
- Tutorial to Native Node.js Modules with C++
- Beginners guide to writing NodeJS Addons using C++ and N-API (node-addon-api)
However, there are a few different ways of doing things also in the native module world. Node handles the plumbing, in that it allows you to just require
a compiled module (essentially a DLL on Windows and a dylib on Mac, only with the file extension .node
instead) into JavaScript code. But on the C++ side, we have a selection of different libraries to choose from for converting values between JavaScript and C++.
You can either do it by using the JavaScript engine V8’s APIs directly, by using a wrapper called NAN (Native Abstractions for Node.js), or by using the newer N-API wrapper. Most native module projects have been using NAN up until now, but N-API is currently the officially recommended way for new projects. So I went with N-API.
It comes in two forms. Either as a pure C API or as a C++ wrapper called node-addon-api, letting you work at a slightly higher level. We’ll be using node-addon-api.
A good first step into the native module waters, would be getting a Hello World module to compile and run. The GitHub repo node-addon-examples has a very helpful one.
The tool used for compiling native Node modules is called node-gyp. Compared to trying to get NodeRT to build, working with node-gyp is a breeze. It has a lot of intelligent defaults, and has been designed to figure things out on its own depending on what’s already present on your system. It’s also clever enough to download and install the Node header files that it needs, so you don’t have to worry much about dependency management.
Electron specifics
Because each Electron version is tied to a specific version of Node, it’s important to use the same version of Node when building the native module. If that’s not possible, see Electron’s documentation on the topic for the available options.
Webpack woes
If you’re using Webpack, you need to take some extra steps to integrate your new module into your project. If you’re not, feel free to skip this section.
After some trial and error, I found the Webpack loader electron-native-loader, which is specifically designed for integrating native modules into Electron projects. The following modifications to your Webpack configuration will be needed.
1. Add electron-native-loader as a loader for .node files.
module: {
rules: [
{
test: /\.node$/,
use: "electron-native-loader"
}
]
}
2. Use the CopyWebpackPlugin to copy the compiled .node files from their project build folders to the Webpack output folder.
plugins: [
new CopyWebpackPlugin(
os.platform() === "darwin" ?
[{
from: "src/native/mac/build/Release/mac.node",
to: "native/mac.node"
}] :
[{
from: "src/native/win/build/Release/win.node",
to: "native/win.node"
}];
)
]
3. Specify the exact require statements used in the code as externals.
externals: {
"native/win": 'require("./native/win")',
"native/mac": 'require("./native/mac")',
}
The author of electron-native-loader also provides the electron-native-plugin and the electron-native-patch-loader, which he recommends using together with electron-native-loader, but for my particular setup, it was easier to just use the above approach.
The real work can begin
Finally, we can move on to focusing on our actual goal, drag and drop. As already mentioned, we only need to worry about drags that are initiated from within our app for the native implementations.
High-level plan
Here’s the outline of what we need to do.
In Electron renderer process
- Specify an element as
draggable
and register fordragstart
events. - Handle
dragstart
and callpreventDefault()
. - Prepare an array with the file paths to drag.
- Create a drag image representing the dragged content.
- Pass the paths and the drag image to the main process via IPC.
In Electron main process
- Require the native module.
- Get a reference to the browser window from Electron's
getNativeWindowHandle()
. - Call the native module with the paths array, drag image and window reference.
In native module
- Parse arguments coming in from JavaScript.
- Prepare the drag payload as required by the OS.
- Set up any listeners or delegates required.
- Call the OS's native API for initiating a drag operation.
- Communicate result of the drag back to Electron app.
The module API
Our native module only needs to expose one function startDrag
to JavaScript.
export interface NativeModule {
startDrag: (
winHandle: Uint8Array,
files: string[],
dragImage?: Uint8Array,
width?: number,
height?: number
) => NativeDragResult;
}
As we shall see, dragImage
, width
and height
will only be used on Windows, so they are optional. The Uint8Array
s are plain byte arrays which will need some interpretation on the C++ side. The NativeDragResult
is just a string used to communicate if a drop happened and what kind of drop it was.
The internals of this function vary depending on the operating system.
Marshalling and interpretation of arrays
But before we get into the specifics of Windows and MacOS respectively, let’s look at some particulars around how we convert (or marshal as the computer scientists like to call it) values across the JavaScript-C++ boundary.
It’s not immediately obvious how to use N-API to turn the arrays into C++ equivalents so I will share the code here. (I ended up mostly using N-API’s plain C API for this as I ran out of patience trying to figure out how to do it with the C++ wrapper.)
To convert a Uint8Array
into an unsigned char*
:
void ParseUint8Array(Napi::Env env, Napi::Value array)
{
napi_typedarray_type type;
size_t length;
void* data;
napi_value arrayBuffer;
size_t byteOffset;
napi_status s = napi_get_typedarray_info(
env, array, &type, &length, &data, &arrayBuffer, &byteOffset);
if (s == napi_ok)
{
unsigned char* bytes = (unsigned char*)data;
// Do something with the bytes...
}
}
To read a string array into an STL vector
of wide strings:
void ParseArray(Napi::Env env, Napi::Value array)
{
std::vector<std::wstring> wideStrings;
uint32_t numStrings;
napi_get_array_length(env, array, &numStrings);
for (unsigned int i = 0; i < numStrings; ++i)
{
napi_value napiValue;
napi_get_element(env, array, i, &napiValue);
Napi::String napiString(env, napiValue);
std::string utf8String = napiString.Utf8Value();
std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> convert;
std::wstring wideString = convert.from_bytes(utf8String);
wideStrings.push_back(wideString);
}
}
The wide strings are needed to be able to call Windows APIs that take WCHAR
strings. On Mac, we don’t need the final conversion step, we can stop once we have the UTF-8 strings.
Worth noting as well is how to convert the winHandle
reference from the bytes
array of unsigned char
produced above to an HWND
and NSView*
respectively.
Windows
unsigned long handle = *reinterpret_cast<unsigned long*>(bytes);
HWND hwnd = (HWND)handle;
Mac
NSView* view = *reinterpret_cast<NSView**>(bytes);
The Windows module
The time has come to make a deep dive into the ancient caves of desktop operating system APIs. We’ll start with the big bad Windows dragon and its infamous Win32 API.
The Win32 API was launched with a lot of fanfare as far back as 1992, and still forms the backbone of Windows. Sure, Microsoft has added various technologies on top of it over the years, like MFC, .NET, and more recently UWP and WinRT, but Win32 is still the canonical way of interfacing with the Windows operating system from C++.
It has its origins in an era when C was considered a high-level language, and seems to be the result of a disparate team of developers who never talked to each other and who each had their own ideas about how things ought to be done. It is truly an instrument of torture. But since the ancient sages teach us that the path to enlightenment leads through suffering, we shall embrace our fate and soldier on.
A good starting point is Microsoft’s own extensive documentation on drag and drop. While not bad, it always seems to stop just short of providing that final crucial detail that’s needed to get things to work.
The shoulders of giants
So like any self-respecting developer, I started off by googling for an existing solution, and soon came across a 2002 Code Project article with attached sample code, seemingly offering the quickest route to a proof of concept.
The demo project still compiles, but it is based on MFC and ends up making calls to functions such as AfxGetMainWnd()
, which just merrily crash when run inside a Node module. However, this project still offered a vital clue to our final solution, namely the preparation of the DROPFILES
structure.
Prepare DROPFILES
DROPFILES
is the name of a C struct that contains the list of file paths to include in the drag. As a visitor from 2019, you’d be forgiven for thinking that specifying a list of files would involve maybe creating an array of strings or something equally rational and sane, but no, we’re now in the dark ages, and the way this needs be done involves the following black magic:
- Add up the number of characters of all the file paths.
- Add 1 for a separator between each path and 2 for a terminator at the end.
- Allocate memory for the
DROPFILES
struct itself. - Allocate memory equal to the character count from 2 just past the end of the
DROPFILES
struct. (But not just any old RAM, it has to be allocated using theGlobalAlloc
call rather than the standardmalloc
.) - Copy the concatenated file paths separated by null characters and ending with two null characters into the dangling memory allocated.
- Assign to
DROPFILES.pFiles
the offset from the starting address ofDROPFILES
to the beginning of the memory chunk containing the file paths.
With the aid of the STL vector
and wstring
classes for at least a semblance of modern convenience, this ends up looking a little something like this:
DROPFILES* CreateDropFiles(std::vector<std::wstring>& files)
{
size_t numChars = 0;
for (auto path : files)
{
numChars += path.length() + 1; // +1 for terminating \0
}
// Add 1 extra for the final extra \0
numChars += 1;
size_t bufferSize = sizeof(DROPFILES) + (sizeof(wchar_t) * numChars);
// Allocate memory from the heap
HGLOBAL hGlobal = GlobalAlloc(GPTR, bufferSize);
// Point pDrop to this memory
DROPFILES* pDrop = (DROPFILES*)hGlobal;
// pFiles is the offset from the beginning of the struct where the
// file list starts. Yes, it's just a bit of RAM tacked onto the end
// of the DROPFILES struct.
pDrop->pFiles = sizeof(DROPFILES);
// Set the Unicode flag
pDrop->fWide = TRUE;
// Copy all the filenames into memory after the end of the DROPFILES struct
wchar_t* pBuf = (wchar_t*)(LPBYTE(pDrop) + sizeof(DROPFILES));
for (auto path : files)
{
const wchar_t* pPath = path.c_str();
StringCchCopyW(pBuf, bufferSize, pPath);
pBuf = 1 + wcschr(pBuf, '\0'); // find the next null char and add one
}
return pDrop;
}
Diving deeper
Now, since MFC won’t work, we need to hunt for some code showing how to initiate the drag and drop using only Win32 functions. Time to call the pros.
Raymond Chen is a near-legendary Microsoft developer who’s been part of the Windows team since the early 90s, and apparently still is. He’s been publishing a few blog posts per week for more than 20 years at his blog The Old New Thing, a treasure trove for all things Win32. The pain of having to use Win32 is somewhat lessened by Raymond’s irreverent and light-hearted style, and he’s got some amazing stories to boot.
Regarding drag and drop, he has two article series on the topic:
Once we’ve managed to track down the earlier blog post currently referenced via a broken link from “Dragging a shell object”, featuring the GetUIObjectOfFile
function, we realise that there are more ways than one to skin also this particular cat. For Raymond doesn’t bother with a DROPFILES
struct at all, instead opting for an approach consisting of this inscrutable bit of code:
HRESULT GetUIObjectOfFile(HWND hwnd, LPCWSTR pszPath, REFIID riid, void **ppv)
{
*ppv = NULL;
HRESULT hr;
LPITEMIDLIST pidl;
SFGAOF sfgao;
if (SUCCEEDED(hr = SHParseDisplayName(pszPath, NULL, &pidl, 0, &sfgao))) {
IShellFolder *psf;
LPCITEMIDLIST pidlChild;
if (SUCCEEDED(hr = SHBindToParent(pidl, IID_IShellFolder,
(void**)&psf, &pidlChild))) {
hr = psf->GetUIObjectOf(hwnd, 1, &pidlChild, riid, NULL, ppv);
psf->Release();
}
CoTaskMemFree(pidl);
}
return hr;
}
To summarise, what’s happening here is that the out parameter called ppv
gets assigned an object holding a type of reference to a file known as an item ID list, derived from the supplied pszPath
parameter. This object can then be used to start the drag in place of one containing DROPFILES
. However, since this approach seems likely to have to hit the disk for each path we want to include, we will stick to the DROPFILES
approach.
The waters are clearing
Reading on, it becomes clear that the key components of a Win32 drag are:
- The
DoDragDrop
function - The
IDataObject
interface - The
IDropSource
interface
To call DoDragDrop
, we need an instance of both an IDataObject
and an IDropSource
. The drop source is a kind of delegate object controlling certain aspects of the drag, and the data object is what contains the payload to drag. This is the object that will hold our lovingly crafted DROPFILES
struct.
The IDataObject
In fact, the object masquerading as void **ppv
(because why wait for the annual obfuscation contest to get an outlet for your sadistic tendencies) in the parameter list of GetUIObjectOfFile
above is in fact an instance of an IDataObject
, saving us the hassle of writing one ourselves. But since we want to avoid the overhead of querying the disk for each file path converstion, we need a different way to create the data object.
IDataObject is a specification for a general-purpose container for clipboard and drag data of any kind, and can thus hold many different types of data in different formats. To create an object adhering to the specification, we need to implement methods like SetData
and GetData
, along with others for querying and enumerating the data types it holds.
Fortunately, it turns out there’s a shortcut here as well, meaning we don’t have to write this entirely from scratch. Some further googling turned up the DragDropVisuals sample project in which we find a DataObject.cpp that recruits a function called SHCreateDataObject
for the heavy lifting. This function is intended for creating a data object from a list of item IDs, but apparently you can trick it into returning a general-purpose data object by passing null
for most of its parameters.
The data object from the sample project is not yet quite fit for our purposes however. For some reason, its EnumFormatEtc
method states that it supports only Unicode text payloads, which is a brazen lie, since the data object created by SHCreateDataObject
can store and return any data type. So we need to change EnumFormatEtc
to just hand over to the internal data object:
IFACEMETHODIMP CDataObject::EnumFormatEtc(DWORD dwDirection, IEnumFORMATETC **ppEnumFormatEtc)
{
return _pdtobjShell->EnumFormatEtc(dwDirection, ppEnumFormatEtc);
}
With this modification, we can put DataObject.cpp
to use in our own solution.
The IDropSource
To create our initial drop source, we can just copy the one Raymond provides here. We will need to make some tweaks later but this is good enough to get going.
Plan of action
Armed with this knowledge, we can now put together a plan of what startDrag
needs to look like inside of the Windows module.
- Parse input arguments.
- Create an instance of our data object.
- Add the list of files and the drag image to the data object.
- Create an instance of our drop source.
- Call
DoDragDrop
with the data object and the drop source.
Adding the files to the data object
To add our DROPFILES
struct to the data object, we need to cast the following spell:
DROPFILES* pDrop = CreateDropFiles(files);
// Prepare FORMATETC and STGMEDIUM to set up a file drag and drop
FORMATETC format = { CF_HDROP, NULL, DVASPECT_CONTENT, -1, TYMED_HGLOBAL };
STGMEDIUM medium;
medium.tymed = TYMED_HGLOBAL;
medium.hGlobal = pDrop;
medium.pUnkForRelease = NULL;
// Create the IDataObject and give it the data
IDataObject* pDataObj = new CDataObject();
BOOL releaseMem = TRUE;
HRESULT hr = pDataObj->SetData(&format, &medium, releaseMem);
if (!SUCCEEDED(hr))
{
GlobalFree(pDrop);
}
Here, we need to bring in further cryptic structs in the form of FORMATETC
and STGMEDIUM
. These are used to tell the data object what kind of data it is we are giving it. The third parameter to SetData
specifies whether the data object should release the memory for our added data payload. We set it to true, which means we only have to call GlobalFree
for the pDrop
in case the SetData
call fails.
Adding the drag image to the data object
Unfortunately, few things are straight-forward when working with Win32, and adding the drag image is no different. The pixel data passed through from Electron in the form of an Uint8Array
needs some careful massaging to shape it into the particular form the IDataObject
requires.
To set the image, we need to convert it into an HBITMAP
and add it to a structure called SHDRAGIMAGE
which we can then include in our data object through the use of the IDragSourceHelper
helper object.
Our earlier N-API-assisted conversion of the dragImage
parameter left us with an unsigned char*
of bytes representing the image. Each 4-byte sequence in this array is one pixel made up of its R, G, B and A components respectively. Our job is now to turn this sequence of bytes into an HBITMAP
.
After barking up various more or less misinformed trees, some further research led me to this example code from the LodePNG project. It decodes a PNG into a byte array and then converts the raw bytes into a BMP, which is pretty much the same format used for an HBITMAP
.
The encodeBMP
function here is just the ticket. But since it prepares output ready for writing to disk, it contains some extraneous header data, which in our case is specified separately as part of the SHDRAGIMAGE
structure. We also need to modify it to expect RGBA rather than just RGB. Thankfully, this was easy due to the foresight of the original developer. Here’s what we end up with to rearrange our pixels to fit the HBITMAP
format:
void EncodeBmp(std::vector<unsigned char>& bmp, const unsigned char* image, int w, int h)
{
// Bytes per pixel used
int inputChannels = 4;
int outputChannels = 4;
int imageRowBytes = outputChannels * w;
imageRowBytes = imageRowBytes % 4 == 0 ?
imageRowBytes :
imageRowBytes + (4 - imageRowBytes % 4); // must be multiple of 4
for (int y = 0; y < h; y++)
{
int c = 0;
for (int x = 0; x < imageRowBytes; x++)
{
if (x < w * outputChannels)
{
int inc = c;
// Convert RGB(A) into BGR(A)
if (c == 0) inc = 2;
else if (c == 2) inc = 0;
bmp.push_back(image[inputChannels * (w * y + x / outputChannels) + inc]);
}
else bmp.push_back(0);
c++;
if (c >= outputChannels) c = 0;
}
}
}
Then we can use the resulting vector
to create our SHDRAGIMAGE
and add it to the data object:
std::vector<unsigned char> bmp;
EncodeBmp(bmp, pixelData, width, height);
HBITMAP hBmp = CreateBitmap(width, height, 1, 32, &bmp[0]);
// Create drag image
SHDRAGIMAGE dragImage;
dragImage.hbmpDragImage = hBmp;
dragImage.sizeDragImage.cx = (LONG)width;
dragImage.sizeDragImage.cy = (LONG)height;
// Mouse cursor offset
dragImage.ptOffset.x = (LONG)width / 2;
dragImage.ptOffset.y = 10;
dragImage.crColorKey = CLR_NONE;
// Add image to data object with the aid of drag source helper
IDragSourceHelper *pDragSourceHelper;
HRESULT hr = CoCreateInstance(
CLSID_DragDropHelper,
NULL,
CLSCTX_ALL,
IID_IDragSourceHelper,
(void**)&pDragSourceHelper);
if (SUCCEEDED(hr))
{
pDragSourceHelper->InitializeFromBitmap(&dragImage, pDataObj);
DeleteObject(dragImage.hbmpDragImage);
pDragSourceHelper->Release();
}
I know…
But pDataObj
now includes the drag image.
Calling DoDragDrop
Now all that’s left is to construct the drop source and call DoDragDrop
:
CDropSource* pDropSource = new CDropSource(hwnd);
DWORD dwEffect;
DoDragDrop(pDataObj, pDropSource, DROPEFFECT_COPY | DROPEFFECT_MOVE, &dwEffect);
The third parameter indicates which types of drop are allowed, and the fourth is an out value telling us which type of drop was actually performed at the other end. “But wait…”, you say, “does that mean that DoDragDrop
is synchronous?” Indeed it is, and that will be our next source of headache.
But with these pieces in place, we are able to call into the native module from our Electron app, and rejoice at the fact that it’s now possible to drag multiple files out and drop them onto any other application!
Frozen vistas
We are calling into DoDragDrop
at the moment we detect a drag start. Unfortunately, that leads to our Electron app freezing up completely for the duration of the drag, presumably related to the synchronous nature of said function. No events are delivered whatsoever! Instead they get buffered up and arrive all at once, once the drop has happened. This makes it impossible to highlight drop targets, or make any other updates to the UI in response to drag events. This is clearly not good enough for dragging things within the app.
The Microsoft documentation states that DoDragDrop
initiates a drag loop, which calls particular functions on the drop target to notify it of drags entering, leaving etc. For whatever reason, this isn’t working when dragging over an Electron app that itself initiates the drag.
What to do? In order of least effort, these were the three potential workarounds I could think of:
- Use
IAsyncOperation
to initiate an asynchronous drag - Implement our own
IDropTarget
and override the one installed by Electron - Don’t call into the native module until the drag leaves the Fileside window
IAsyncOperation
The docs mention an IAsyncOperation
interface, which sounds like it could be a way forward. On closer inspection, it turns out that the asynchronicity only refers to the process of extracting data from the data object post-drop, and not to the drag itself. Dead end.
Our own drop target
In the Win32 model, an application must implement and register an object conforming to the IDropTarget
interface in order to accept drops. Since our native module is technically running inside the Electron process, if we could only switch out whatever drop target has been registered by Electron, we could maybe make sure that the app reacts appropriately to our drag movements?
It turns out that this is actually possible, by calling the Win32 functions RevokeDragDrop
followed by RegisterDragDrop
passing in our own drop target instance. And by implementing its methods DragEnter
, DragOver
and DragLeave
, we are able to break the impasse! Our app comes back to life and responds to drag events.
There’s only one problem. And it’s a serious one. How can we pass the data received by the drop target through to Electron for delivery as events over in JavaScript land? The ideal solution would be if we could keep hold of Electron’s original drop target, and then call through to it for delivery of our intercepted events, while retaining control of the return values we give to Windows. Alas, no Windows API exists for retrieving an already registered drop target from a process.
Only start native drag at window boundary
That leaves us with only the iffiest of the three workarounds left to try. This involves starting the drag within the application using JavaScript, and then switch over to the native OS version when the mouse crosses the boundary of the application window. If the drag is dropped outside, we cancel the internal drag. If the drag comes back in without a drop, we cancel the external drag and resume the JavaScript drag.
This sounds all good and well in theory, but once I had it all wired up using the HTML5 drag and drop constructs, I could find no reliable way of cancelling the drag on an outside drop, hence the internal drag continued as soon as the mouse cursor moved back into the window. Which made for a pretty broken experience.
Emulate internal drags
To achieve our aim, we need more control over the initiated drag than the drag and drop API gives us. By only registering for mouse events (instead of drag events), we can emulate what the drag and drop API does and thus also decide ourselves when to cancel a drag. The drag and drop API is after all just a convenience abstraction on top of the mouse events.
In our case, the following steps are necessary:
- Use
ondragstart
for the draggable element but callevent.preventDefault()
to prevent the browser’s drag and drop implementation from kicking in. - Register a window listener for
mouseout
. - Register document listeners for events
mouseenter
,mouseleave
,mousemove
,mouseup
,keydown
keyup
. - Register a handler
onDragResult
for delivery of the native module’s drag result. - Manually create a DOM node to use as a drag image and set it to
position: absolute
. - On each
mousemove
, set the drag image’sleft
andtop
properties to correspond to the mouse pointer position. - On each
mouseenter
andmouseleave
, synthesizedragenter
anddragleave
and dispatch them to the events’ respective target elements. - Handle
keydown
andkeyup
events to detect modifier keys being held and update the mouse pointer accordingly. - On
mouseout
, check ifevent.relatedTarget
is “HTML”, and if so, call the native module’sstartDrag
. Set a flag that we’re now in an external drag. - Trigger the drop when
mouseup
is received, if we’re not in an external drag. - On a drag result indicating an external drop, reset all state related to our internal drag.
See this tutorial for more detail about emulating drag and drop.
Synthesizing the drag events might sound complicated but it essentially just involves copying over the properties of the mouse event into an event of a different type:
// type is an event name like "dragenter", "dragleave" etc.
// e is a MouseEvent received through a mouse event handler.
function synthesizeDragEvent(type: string, e: MouseEvent) {
return new DragEvent(
type,
{
bubbles: true,
cancelable: true,
view: window,
detail: e.detail,
screenX: e.screenX,
screenY: e.screenY,
clientX: e.clientX,
clientY: e.clientY,
ctrlKey: e.ctrlKey,
altKey: e.altKey,
shiftKey: e.shiftKey,
metaKey: e.metaKey,
button: e.button,
relatedTarget: e.relatedTarget
};
);
}
One further complication when emulating drags is that any hover states will get triggered on elements over which the drag passes. One way to work around this, is to put transparent overlays on top of the areas of the application that contain hoverable elements. These overlays are only in place for as long as a drag is in progress. The overlay swallows the mouse events and prevents the hover states from being activated.
Switching over
The most reliable way to detect the mouse exiting the browser window is, according to the wise people of the world wide web, the mouseout
event being fired on the window
object. If we receive such an event and its relatedTarget
property equals “HTML”, then we have left the window. This is our signal to trigger the sequence of events that needs to happen to transition into the native drag:
- Hide the element used as a drag image for the internal drag by setting its opacity to 0.
- Create a bitmap with a screenshot of the drag image.
- Call the native module’s
startDrag
with the array of dragged paths, the bitmap, and its width and height.
For creating the bitmap, we can use the dom-to-image library and its toPixelData()
function. This gives us the required Uint8Array
of RGBA pixels that can be passed to the native module.
The drop
Now, if a drop happens outside of the app, we need to communicate that back from our native module, so that we can cancel our internal drag on the JavaScript side. The same applies if the external drag is cancelled by pressing Escape.
On Windows, the drag result can be communicated back to Electron by just returning it from startDrag
. Exactly how to calculate it in the native module requires some thought however.
DoDragDrop
tells us directly whether the drop was a copy or a move, but we also want to detect whether the external drag moved back in over the application window, in which case we need to cancel it (to prevent another deep freeze) and return a drag result indicating re-entry. The internal drag code can then just pick up where it left off and continue the internal drag.
The code for detecting this is worth having a look at in some detail. It lives in our drop source’s implementation of QueryContinueDrag
:
HRESULT CDropSource::QueryContinueDrag(BOOL fEscapePressed, DWORD grfKeyState)
{
POINT pointerPos;
if (GetCursorPos(&pointerPos))
{
HWND hwndUnderPointer = WindowFromPoint(pointerPos);
HWND rootHwndUnderPointer = GetAncestor(hwndUnderPointer, GA_ROOT);
if (rootHwndUnderPointer == mHwnd)
{
if (mHasBeenOutside)
{
mDidReEnter = true;
return DRAGDROP_S_CANCEL;
}
else
{
return S_OK;
}
}
else
{
mHasBeenOutside = true;
}
}
if (fEscapePressed) {
mCancelled = true;
return DRAGDROP_S_CANCEL;
}
if (!(grfKeyState & (MK_LBUTTON | MK_RBUTTON)))
return DRAGDROP_S_DROP;
return S_OK;
}
This function is called repeatedly by the OS to give our drop source a say in whether the drag should continue given certain conditions. The mHasBeenOutside
boolean is initialised to false and is needed at the beginning of the external drag, since Windows and the browser don’t quite agree on what the boundaries of the window are. The area just outside of it still seems to be considered part of the window by Windows, so we use this boolean to avoid triggering re-entry immediately upon leaving the application window.
Once we’ve been outside and we again detect that the root window under the mouse is our mother HWND
, we set another internal boolean mDidReEnter
to true and return DRAGDROP_S_CANCEL
to inform Windows that we’ve lost our will to live as this particular drag incarnation.
The main startDrag
function can then query the drop source for its state after DoDragDrop
returns and return the appropriate drag result.
Yay, it works! Kinda
Phew. After all that, we now have working drag and drop on Windows, with a relevant drag image and support for modifier keys.
All is however still not quite perfect. If dragging out of the app, and then back in again without a drop, the internal drag resumes. But if we then try to move back out once more during the same drag, the required mouseout
event just doesn’t get delivered, and the drag is trapped. I haven’t been able to figure out why this happens yet, but since this particular interaction is probably quite unlikely during normal use, we can live with it for the time being.
Another lingering glitch shows up if we first make a drag out of our app into Explorer and drop it. Then grab something else from the same Explorer window and drag it back into Fileside. Once we cross the boundary into the app window, the Explorer-provided drag image is replaced by a stretched version of whatever the previous drag image generated by the app was. Just dragging across the boundary once more fixes it, but it’s an unfortunate cosmetic blight, which is yet to find a proper fix.
The Mac module
Accomplishing the same thing on MacOS is an altogether easier feat. Instead of dealing with hairy old monsters from the 1980s, we’re dealing with the Cocoa API, which, despite also having its roots in that glorious decade, is a much more friendly beast, in large part due to Apple’s very different approach to backwards-compatibility. In the Apple universe, APIs get maintained, updated and deprecated in due course, putting some burden on developers to follow along and update their code on the one hand, but makes for a much smoother development experience on the other.
The only slight oddity here is having to deal with Objective-C, but it’s very much in the same family as C and C++ (we can even mix the three freely in an Objective-C source file with the extension .mm
). All we really need to know for this exercise is that methods on objects are called like this: [object methodWithArgument: argument]
, and that there’s a very helpful website out there for dealing with Objective-C’s version of anonymous functions (or closures), known as blocks.
Cocoa’s drag and drop APIs went through an update a few years ago, and the official documentation for the new API does unfortunately leave something to be desired. This tutorial from raywenderlich.com was the most comprehensive I found for the modern API, and despite being written in Swift, is still a great help in explaining the concepts involved.
Plan of action
On the Mac, we don’t have the problem of the app freezing up when initiating a drag, so we can just call the native startDrag
immediately on detecting a drag start; no need for all that iffy switching around when going in and out.
Here’s an outline of the steps involved:
- Parse input arguments.
- Prepare an
NSDraggingItem
for each file containing its path and an icon. - Create an
NSDraggingSource
object to control the drag. - Synthesize an
NSLeftMouseDragged
event containing the current window and mouse position. - Call
[NSView beginDraggingSessionWithItems]
giving it an array ofNSDraggingItem
s, the dragging source and the synthesized drag event.
Once it’s been given a list of NSDraggingItem
s, the OS will arrange them in a neat list (which we can also customise via the property draggingFormation
of the NSDraggingSession
object returned by beginDraggingSessionWithItems
) that follows the mouse pointer around. It would of course also be possible to provide a custom drag image here, but the OS is doing an elegant enough job of this with the native file icons, that we don’t really need it.
The NSDraggingItem
The NSDraggingItem
takes a file path in the form of an NSURL
and an icon representing it as an NSImage
.
It’s important that we use the dragging item’s imageComponentProvider
with its slightly tricky syntax, and not the simpler setDraggingFrame
when setting the image. This allows the OS to optimise the retrieval of the images, so as to not get bogged down when initialising a drag with a large number of files.
Assuming files
is a std::vector
of std::string
s in UTF-8 format, this is how an array of NSDraggingItem
s is created:
NSMutableArray* dragItems = [[NSMutableArray alloc] init];
for (auto& file : files) {
NSString* nsFile = [[NSString alloc] initWithUTF8String:file.c_str()];
NSURL* fileURL = [NSURL fileURLWithPath: nsFile];
NSImage* icon = [[NSWorkspace sharedWorkspace] iconForFile:nsFile];
NSSize iconSize = NSMakeSize(32, 32); // according to documentation
NSArray* (^providerBlock)() = ^NSArray*() {
NSDraggingImageComponent* comp = [[[NSDraggingImageComponent alloc]
initWithKey: NSDraggingImageComponentIconKey] retain];
// The x, y here seem to control the offset from the mouse pointer
comp.frame = NSMakeRect(0, 0, iconSize.width, iconSize.height);
comp.contents = icon;
return @[comp];
};
NSDraggingItem* dragItem = [[NSDraggingItem alloc] initWithPasteboardWriter: fileURL];
// The x, y here determine from what point the images fly in at the beginning
// of the drag. The size determines the space each DraggingImage has, so can
// be used to create overlapping icons or spacing between them.
dragItem.draggingFrame = NSMakeRect(
mousePos.x, mousePos.y, iconSize.width, iconSize.height);
dragItem.imageComponentsProvider = providerBlock;
[dragItems addObject: dragItem];
}
The NSDraggingSource
This is the MacOS equivalent of the Win32 IDropSource
object and here we need to implement two methods:
- (NSDragOperation) draggingSession:(NSDraggingSession *)session
sourceOperationMaskForDraggingContext:(NSDraggingContext)context;
- (BOOL)ignoreModifierKeysForDraggingSession:(NSDraggingSession *)session
The second one will just return NO
(Objective-C speak for false
), but the first one is of some importance. It allows us to specify which drag operations should be permitted. Here I only arrived at the required combination of flags through a process of trial and error:
- (NSDragOperation) draggingSession:(NSDraggingSession *)session
sourceOperationMaskForDraggingContext:(NSDraggingContext)context
{
// This combination of flags gives the behaviour we want, somehow:
// - it uses move pointer by default (no plus)
// - plus appears when pressing Alt and drop is allowed
// - pointer stays unchanged when pressing Cmd and drop is allowed
// - pointer stays unchanged when holding Ctrl and drop is not allowed
//
// If using NSDragOperationEvery, this is not the case as we then get
// the plus pointer by default.
return NSDragOperationCopy |
NSDragOperationMove |
NSDragOperationGeneric |
NSDragOperationMove |
NSDragOperationDelete;
}
Synthesizing the drag event
This is slightly more involved than synthesizing an event in the browser, here’s what it looks like:
NSEvent* SynthesizeEvent(NSView* view) {
NSWindow* window = [view window];
NSPoint position = [window mouseLocationOutsideOfEventStream];
NSTimeInterval eventTime = [[NSApp currentEvent] timestamp];
NSEvent* dragEvent = [NSEvent mouseEventWithType: NSLeftMouseDragged
location: position
modifierFlags: NSLeftMouseDraggedMask
timestamp: eventTime
windowNumber: [window windowNumber]
context: nil
eventNumber: 0
clickCount: 1
pressure: 1.0];
return dragEvent;
}
The view
parameter is the application window reference passed across from Electron.
Putting it all together
Then we use the pieces we’ve created and initiate the drag:
DraggingSource* customSource = [DraggingSource new];
NSEvent* dragEvent = SynthesizeEvent(view);
NSDraggingSession* session = [view
beginDraggingSessionWithItems: dragItems
event: dragEvent
source: customSource];
session.draggingFormation = NSDraggingFormationList;
And there we are. Drag and drop on Mac is done.
The pesky plus
An extra hack was needed back in JavaScript land to prevent the mouse pointer from showing a plus by default when hovering over non-droppable areas. Adding an ondragover
handler to the root-level element in our app and having it call event.preventDefault()
fixes this. It seems to prod it into taking the allowed NSDragOperation
s from our NSDraggingSource
into account.
Not even MacOS is perfect
Even with the hack above, there are occasional flashes of plus icon when dragging something around within the app.
And when dragging something out of the app, the pointer icon sometimes doesn’t immediately update on pressing a modifier key, but requires a slight wiggle of the mouse to update. This feels buggy, however the exact same thing happens when dragging a file from Xcode’s file tree out to Finder, so the bug is probably not ours.
The end
So there we are. Now you know what to do if you really, really, really want working drag and drop from an Electron app on Windows and Mac. In hindsight, forking Electron and modifying its startDrag
to support drag and drop properly might actually have been the less time-consuming route, but maintaining a fork of a huge framework like Electron is not something to be taken lightly. At least we don’t have a maintenance burden with our stand-alone solution.